Oligonucleotide mediated nucleic acid recombination

ABSTRACT

Methods of recombining nucleic acids, including homologous nucleic acids, are provided. Families of gene shuffling oligonucleotides and their use in recombination procedures, as well as polymerase and ligase mediated recombination methods are also provided.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part of “OLIGONUCLEOTIDE MEDIATEDNUCLEIC ACID RECOMBINATION” by Crameri et al., U.S. Ser. No. 09/408,392,filed Sep. 28, 1999, which is a non-provisional of “OLIGONUCLEOTIDEMEDIATED NUCLEIC ACID RECOMBINATION” by Crameri et al., U.S. Ser. No.60/118,813, filed Feb. 5, 1999 and which is also a non-provisional of“OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION” by Crameri et al.,U.S. Ser. No. 60/141,049, filed Jun. 24, 1999.

This application is also a continuation-in-part of “METHODS FOR MAKINGCHARACTER STRINGS, POLYNUCLEOTIDES AND POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov et al., attorney docket number 02-289-3US,filed herewith, which is a continuation-in-part of “METHODS FOR MAKINGCHARACTER STRINGS, POLYNUCLEOTIDES AND POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov et al., U.S. Ser. No. 09/416,375, filedOct. 12, 1999, which is a non provisional of “METHODS FOR MAKINGCHARACTER STRINGS, POLYNUCLEOTIDES AND POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov and Stemmer, U.S. Ser. No. 60/116,447,filed Jan. 19, 1999 and which is also a non-provisional of “METHODS FORMAKING CHARACTER STRINGS, POLYNUCLEOTIDES AND POLYPEPTIDES HAVINGDESIRED CHARACTERISTICS” by Selifonov and Stemmer, U.S. Ser. No.60/118,854, filed Feb. 5, 1999.

This application is also a continuation-in-part of co-filed application“METHODS OF POPULATING DATA STRUCTURES FOR USE IN EVOLUTIONARYSIMULATIONS” by Selifonov and Stemmer, Attorney Docket Number3271.002WO0 (filed by Majestic, Parsons, Siebert & Hsue) which is acontinuation-in-part of “METHODS OF POPULATING DATA STRUCTURES FOR USEIN EVOLUTIONARY SIMULATIONS” by Selifonov and Stemmer, U.S. Ser. No.09/416,837, filed Oct. 12, 1999.

This application is also related to “USE OF CODON VARIED OLIGONUCLEOTIDESYNTHESIS FOR SYNTHETIC SHUFFLING” by Welch et al., U.S. Ser. No.09/408,393, filed Sep. 28, 1999.

The present application claims priority to and benefit of each of theapplications listed in this section, as provided for under 35 U.S.C.§119(e) and/or 35 U.S.C. §120, as appropriate.

COPYRIGHT NOTIFICATION

Pursuant to 37 C.F.R. 1.71(e), Applicants note that a portion of thisdisclosure contains material which is subject to copyright protection.The copyright owner has no objection to the facsimile reproduction byanyone of the patent document or patent disclosure, as it appears in thePatent and Trademark Office patent file or records, but otherwisereserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

DNA shuffling has provided a paradigm shift in recombinant nucleic acidgeneration, manipulation and selection. The inventors and theirco-workers have developed fast artificial evolution methodologies forgenerating improved industrial, agricultural, and therapeutic genes andencoded proteins. These methods, and related compositions and apparatusfor practicing these methods represent a pioneering body of work by theinventors and their co-workers.

A number of publications by the inventors and their co-workers describeDNA shuffling. For example, Stemmer et al. (1994) “Rapid Evolution of aProtein” Nature 370:389-391; Stemmer (1994) “DNA Shuffling by RandomFragmentation and Reassembly: in vitro Recombination for MolecularEvolution,” Proc. Natl. Acad. USA 91:10747-10751; Stemmer U.S. Pat. No.5,603,793 METHODS FOR IN VITRO RECOMBINATION; Stemmer et al. U.S. Pat.No. 5,830,721 DNA MUTAGENESIS BY RANDOM FRAGMENTATION AND REASSEMBLY;Stemmer et al., U.S. Pat. No. 5,811,238 METHODS FOR GENERATINGPOLYNUCLEOTIDES HAVING DESIRED CHARACTERISTICS BY ITERATIVE SELECTIONAND RECOMBINATION describe, e.g., in vitro and in vivo nucleic acid, DNAand protein shuffling in a variety of formats, e.g., by repeated cyclesof mutagenesis, shuffling and selection, as well as methods ofgenerating libraries of displayed peptides and antibodies.

Applications of DNA shuffling technology have also been developed by theinventors and their co-workers. In addition to the publications notedabove, Minshull et al., U.S. Pat. No. 5,837,458 METHODS AND COMPOSITIONSFOR CELLULAR AND METABOLIC ENGINEERING provides, e.g., for the evolutionof metabolic pathways and the enhancement of bioprocessing throughrecursive shuffling techniques. Crameri et al. (1996), “Construction AndEvolution Of Antibody-Phage Libraries By DNA Shuffling” Nature Medicine2(1):100-103 describe, e.g., antibody shuffling for antibody phagelibraries. Additional details regarding DNA Shuffling can be found inWO95/22625, WO97/ 20078, WO96/33207, WO97/33957, WO98/27230, WO97/35966,WO98/31837, WO98/13487, WO98/13485 and WO989/42832, as well as a numberof other publications by the inventors and their co-workers.

A number of the publications of the inventors and their co-workers, aswell as other investigators in the art also describe techniques whichfacilitate DNA shuffling, e.g., by providing for reassembly of genesfrom small fragments, or even oligonucleotides. For example, in additionto the publications noted above, Stemmer et al. (1998) U.S. Pat. No.5,834,252 END COMPLEMENTARY POLYMERASE REACTION describe processes foramplifying and detecting a target sequence (e.g., in a mixture ofnucleic acids), as well as for assembling large polynucleotides fromnucleic acid fragments.

Review of the foregoing publications reveals that forced evolution bygene shuffling is an important new technique with many practical andpowerful applications. Thus, new techniques which facilitate geneshuffling are highly desirable. The present invention providessignificant new gene shuffling protocols, as well as many other featureswhich will be apparent upon complete review of this disclosure.

SUMMARY OF THE INVENTION

The invention provides oligonucleotide assisted shuffling of nucleicacids. These oligonucleotide assisted approaches particularly facilitatefamily shuffling procedures, providing substantially simplifiedshuffling protocols which can be used to produce family shuffled nucleicacids without isolating or cloning full-length homologous nucleic acids.Furthermore, the oligonucleotide assisted approaches herein can even beextended to shuffling non-homologous nucleic acids, thereby accessinggreater sequence space an resulting recombinant molecules and, thus,greater molecular diversity. The techniques can also be combined withclassical DNA shuffling protocols, such as DNAse-mediated methods, orwith other diversity generation procedures such as classicalmutagenesis, to increase the versatility and throughput of thesemethods.

Several methods which are applicable to family shuffling procedures areprovided. In one aspect of these methods, sets of overlapping familygene shuffling oligonucleotides are hybridized and elongated, providinga population of recombined nucleic acids, which can be selected for adesired trait or property. Typically, the set of overlapping familyshuffling gene oligonucleotides include a plurality of oligonucleotidemember types which have consensus region subsequences derived from aplurality of homologous target nucleic acids. The oligo sets optionallyprovide other distinguishing features, including cross-over capability,codon-variation or selection, and the like.

The population of recombined nucleic acids can be denatured andreannealed, providing denatured recombined nucleic acids which can thenbe reannealed. The resulting recombinant nucleic acids can also beselected. Any or all of these steps can be repeated reiteratively,providing for multiple recombination and selection events to produce anucleic acid with a desired trait or property.

In a related aspect, methods for introducing nucleic acid familydiversity during nucleic acid recombination are performed by providing acomposition having at least one set of fragmented nucleic acids whichincludes a population of family gene shuffling oligonucleotides andrecombining at least one of the fragmented nucleic acids with at leastone of the family gene shuffling oligonucleotides. A recombinant nucleicacid having a nucleic acid subsequence corresponding to the at least onefamily gene shuffling oligonucleotide is then regenerated, typically toencode a full-length molecule (e.g., a full-length protein).

Typically, family gene shuffling oligonucleotides are provided byaligning homologous nucleic acid sequences to select conserved regionsof sequence identity and regions of sequence diversity. A plurality offamily gene shuffling oligonucleotides are synthesized (serially or inparallel) which correspond to at least one region of sequence diversity.In contrast, sets of fragments are provided by cleaving one or morehomologous nucleic acids (e.g., with a DNase), or by synthesizing a setof oligonucleotides corresponding to a plurality of regions of at leastone nucleic acid (typically oligonucleotides corresponding to afull-length nucleic acid are provided as members of a set of nucleicacid fragments). In the shuffling procedures herein, these cleavagefragments can be used in conjunction family gene shufflingoligonucleotides, e.g., in one or more recombination reaction.

Recursive methods of oligonucleotide shuffling are provided. As notedherein, recombinant nucleic acids generated synthetically usingoligonucleotides can be cleaved and shuffled by standard nucleic acidshuffling methodologies, or the nucleic acids can be sequenced and usedto design a second set of family shuffling oligonucleotides which areused to recombine the recombinant nucleic acids. Either, or both, ofthese recursive techniques can be used for subsequent rounds ofrecombination and can also be used in conjunction with rounds ofselection of recombinant products. Selection steps can follow one orseveral rounds of recombination, depending on the desired diversity ofthe recombinant nucleic acids (the more rounds of recombination whichare performed, the more diverse the resulting population of recombinantnucleic acids).

The use of family gene shuffling oligonucleotides in recombinationreactions herein provides for domain switching of domains of sequenceidentity or diversity between homologous nucleic acids, e.g., whererecombinants resulting from the recombination reaction providerecombinant nucleic acids with a sequence domain from a first nucleicacid embedded within a sequence corresponding to a second nucleic acid,e.g., where the region most similar to the embedded region from thesecond nucleic acid is not present in the recombinant nucleic acid.

One particular advantage of the present invention is the ability torecombinehomologous nucleic acids with low sequence similarity, or evento recombine non-homologous nucleic acids. In these methods, one or moreset of fragmented nucleic acids are recombined with a with a set ofcrossover family diversity oligonucleotides. Each of these crossoveroligonucleotides have a plurality of sequence diversity domainscorresponds to a plurality of sequence diversity domains from homologousor non-homologous nucleic acids with low sequence similarity. Thefragmented oligonucleotides, which are derived from one or morehomologous or non-homologous nucleic acids can hybridize to one or moreregion of the crossover oligos, facilitating recombination.

Methods of family shuffling PCR amplicons using family diversityoligonucleotide primers are also provided. In these methods, a pluralityof non-homogeneous homologous template nucleic acids are provided. Aplurality of PCR primers which hybridize to a plurality of the pluralityof non-homogeneous homologous template nucleic acids are also provided.A plurality of PCR amplicons are produced by PCR amplification of theplurality of template nucleic acids with the plurality of PCR primers,which are then recombined. Typically, sequences for the PCR primers areselected by aligning sequences for the plurality of non-homogeneoushomologous template nucleic acids and selecting PCR primers whichcorrespond to regions of sequence similarity.

A variety of compositions for practicing the above methods and whichresult from practicing the above methods are also provided. Compositionswhich include a library of oligonucleotides having a plurality ofoligonucleotide member types are one example. The library can include atleast about 2, 3, 5, 10, 20, 30, 40, 50, 100 or more differentoligonucleotide members. The oligonucleotide member types correspond toa plurality of subsequence regions of a plurality of members of aselected set of a plurality of homologous target sequences. Theplurality of subsequence regions can include, e.g., a plurality ofoverlapping or non-overlapping sequence regions of the selected set ofhomologous target sequences. The oligonucleotide member types typicallyeach have a sequence identical to at least one subsequence from at leastone of the selected set of homologous target sequences. Any of theoligonucleotide types and sets described above, or elsewhere herein, canbe included in the compositions of the invention (e.g., family shufflingoligonucleotides, crossover oligonucleotides, domain switchingoligonucleotides, etc.). The oligonucleotide member types can include aplurality of homologous oligonucleotides corresponding to a homologousregion from the plurality of homologous target sequences. In thisembodiment, each of the plurality of homologous oligonucleotides have atleast one variant subsequence. Libraries of nucleic acids and encodedproteins which result from practicing oligonucleotide-mediatedrecombination as noted herein are also a feature of the invention.

Compositions optionally include components which facilitaterecombination reactions, e.g., a polymerase, such as a thermostable DNApolymerase (e.g., taq, vent or any of the many other commerciallyavailable polymerases) a recombinase, a nucleic acid synthesis reagent,buffers, salts, magnesium, one or more nucleic acid having one or moreof the plurality of members of the selected set of homologous targetsequences, and the like.

Kits comprising the compositions of the invention, e.g., in containers,or other packaging materials, e.g., with instructional materials forpracticing the methods of the invention are also provided. Uses for thecompositions and kits herein for practicing the methods are alsoprovided.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a schematic showing oligonucleotide-directed in vivo shufflingusing chimeraplasts.

FIG. 2 is a schematic of a low-homology shuffling procedure to providefor synthetic gene blending.

FIG. 3 is a schematic of a modular exon deletion/insertion library.

Definitions

Unless otherwise indicated, the following definitions supplement thosein the art.

Nucleic acids are “homologous” when they are derived, naturally orartificially, from a common ancestor sequence. During natural evolution,this occurs when two or more descendent sequences diverge from a parentsequence over time, i.e., due to mutation and natural selection. Underartificial conditions, divergence occurs, e.g., in one of two basicways. First, a given sequence can be artificially recombined withanother sequence, as occurs, e.g., during typical cloning, to produce adescendent nucleic acid, or a given sequence can be chemically modified,or otherwise manipulated to modify the resulting molecule.Alternatively, a nucleic acid can be synthesized de novo, bysynthesizing a nucleic acid which varies in sequence from a selectedparental nucleic acid sequence. When there is no explicit knowledgeabout the ancestry of two nucleic acids, homology is typically inferredby sequence comparison between two sequences. Where two nucleic acidsequences show sequence similarity over a significant portion of each ofthe nucleic acids, it is inferred that the two nucleic acids share acommon ancestor. The precise level of sequence similarity whichestablishes homology varies in the art depending on a variety offactors. For purposes of the present invention, cladistic intermediates(proposed sequences which share features of two or more related nucleicacids) are homologous nucleic acids.

For purposes of this disclosure, two nucleic acids are consideredhomologous where they share sufficient sequence identity to allow directrecombination to occur between, the two nucleic acid molecules.Typically, nucleic acids utilize regions of close similarity spacedroughly the same distance apart to permit recombination to occur. Therecombination can be in vitro or in vivo.

It should be appreciated, however, that one advantage of certainfeatures of the invention is the ability to recombine more distantlyrelated nucleic acids than standard recombination techniques permit. Inparticular, sequences from two nucleic acids which are distantlyrelated, or even not detectably related can be recombined usingcross-over oligonucleotides which have subsequences from two or moredifferent non-homologous target nucleic acids, or two or more distantlyrelated nucleic acids. However, where the two nucleic acids can only beindirectly recombined using oligonucleotide intermediates as set forthherein, they are considered to be “non-homologous” for purposes of thisdisclosure.

A “set” as used herein refers to a collection of at least two moleculestypes, and typically includes at least about, e.g., 5, 10, 50, 100, 500,1,000 or more members, depending on the precise intended use of the set.

A set of “family gene shuffling oligonucleotides” is a set ofsynthesized oligonucleotides derived from a selected set of homologousnucleic acids. The oligonucleotides are derived from a selected set ofhomologous nucleic acids when they (individually or collectively) haveregions of sequence identity (and, optionally, regions of sequencediversity) with more than one of the homologous nucleic acids.Collectively, the oligonucleotides typically correspond to a substantialportion of the full length of the homologous nucleic acids of the set ofhomologous nucleic acids, e.g., the oligonucleotides correspond over asubstantial portion of the length of the homologous nucleic acids (e.g.,the oligonucleotides of the set collectively correspond to e.g., 25% ormore, often 35% or more, generally 50% or more, typically 60% or more,more typically 70% or more, and in some applications, 80%, 90% or 100%of the full-length of each of the homologous nucleic acids). Mostcommonly, the family gene shuffling oligonucleotides include multiplemember types, each having regions of sequence identity to at least onemember of the selected set of homologous nucleic acids (e.g., about 2,3, 5, 10, 50 or more member types).

A “cross-over” oligonucleotide has regions of sequence identity to atleast two different members of a selected set of nucleic acids, whichare optionally homologous or non-homologous.

Nucleic acids “hybridize” when they associate, typically in solution.Nucleic acids hybridize due to a variety of well characterizedphysico-chemical forces, such as hydrogen bonding, solvent exclusion,base stacking and the like. An extensive guide to the hybridization ofnucleic acids is found in Tijssen (1993) Laboratory Techniques inBiochemistry and Molecular Biology—Hybridization with Nucleic AcidProbes part I chapter 2 “Overview of principles of hybridization and thestrategy of nucleic acid probe assays,” Elsevier, New York, as well asin Ausubel, supra.

Two nucleic acids “correspond” when they have the same or complementarysequences, or when one nucleic acid is a subsequence of the other, orwhen one sequence is derived, by natural or artificial manipulation,from the other.

Nucleic acids are “elongated” when additional nucleotides (or otheranalogous molecules) are incorporated into the nucleic acid. Mostcommonly, this is performed with a polymerase (e.g., a DNA polymerase),e.g., a polymerase which adds sequences at the 3′ terminus of thenucleic acid.

Two nucleic acids are “recombined” when sequences from each of the twonucleic acids are combined in a progeny nucleic acid. Two sequences are“directly” recombined when both of the nucleic acids are substrates forrecombination. Two sequences are “indirectly recombined” when thesequences are recombined using an intermediate such as a cross-overoligonucleotide. For indirect recombination, no more than one of thesequences is an actual substrate for recombination, and in some cases,neither sequence is a substrate for recombination (i.e., when one ormore oligonucleotide(s) corresponding to the nucleic acids arehybridized and elongated).

A collection of “fragmented nucleic acids” is a collection of nucleicacids derived by cleaving one or more parental nucleic acids (e.g., witha nuclease, or via chemical cleavage), or by producing subsequences ofthe parental sequences in any other manner, such as partial chainelongation of a complementary nucleic acid.

A “full-length protein” is a protein having substantially the samesequence domains as a corresponding protein encoded by a natural gene.The protein can have modified sequences relative to the correspondingnaturally encoded gene (e.g., due to recombination and selection), butis at least 95% as long as the naturally encoded gene.

A “DNase enzyme” is an enzyme such as DNAse I which catalyzes cleavageof a DNA, in vitro or in vivo. A wide variety of DNase enzymes are wellknown and described, e.g., in Sambrook, Berger and Ausubel (all supra)and many are commercially available.

A “nucleic acid domain” is a nucleic acid region or subsequence. Thedomain can be conserved or not conserved between a plurality ofhomologous nucleic acids.

Typically, a domain is delineated by comparison between two or moresequences, i.e., a region of sequence diversity between sequences is a“sequence diversity domain,” while a region of similarity is a “sequencesimilarity domain.” Domain switching” refers to the ability to switchone nucleic acid region from one nucleic acid with a second domain froma second nucleic acid.

A region of “high sequence similarity” refers to a region that is 90% ormore identical to a second selected region when aligned for maximalcorrespondence (e.g., 20 manually or using the common program BLAST setto default parameters). A region of “low sequence similarity” is 60% orless identical, more preferably, 40% or less identical to a secondselected region, when aligned for maximal correspondence (e.g., manuallyor using BLAST set with default parameters).

A “PCR amplicon” is a nucleic acid made using the polymerase chainreaction (PCR). Typically, the nucleic acid is a copy of a selectednucleic acid. A “PCR primer” is a nucleic acid which hybridizes to atemplate nucleic acid and permits chain elongation using a thermostablepolymerase under appropriate reaction conditions.

A “library of oligonucleotides” is a set of oligonucleotides. The setcan be pooled, or can be individually accessible. Oligonucleotides canbe DNA, RNA or combinations of RNA and DNA (e.g., chimeraplasts).

DETAILED DISCUSSION OF THE INVENTION

The present invention relates to improved formats for nucleic acidshuffling. In particular, by using selected oligonucleotide sets assubstrates for recombination and/or gene synthesis, it is possible todramatically speed the shuffling process. Moreover, it is possible touse oligonucleotide intermediates to indirectly recombine nucleic acidswhich could not otherwise be recombined. Direct access to physicalnucleic acids corresponding to sequences to be combined is notnecessary, as the sequences can be recombined indirectly througholigonucleotide intermediates.

In brief, a family of homologous nucleic acid sequences are firstaligned, e.g. using available computer software to select regions ofidentity/ similarity and regions of diversity. A plurality (e.g., 2, 5,10, 20, 50, 75, or 100 or more) of oligonucleotides corresponding to atleast one region of diversity (and ordinarily at least one region ofsimilarity) are synthesized. These oligonucleotides can be shuffleddirectly, or can be recombined with one or more of the family of nucleicacids.

This oligonucleotide-based recombination of related nucleic acids can becombined with a number of available standard shuffling methods. Forexample, there are several procedures now available for shufflinghomologous nucleic acids, such as by digesting the nucleic acids with aDNase, permitting recombination to occur and then regeneratingfull-length templates, e.g., as described in Stemmer (1998) DNAMUTAGENESIS BY RANDOM FRAGMENTATION AND REASSEMBLY U.S. Pat. No.5,830,721. Thus, in one embodiment of the invention, a full-lengthnucleic acid which is identical to, or homologous with, at least one ofthe homologous nucleic acids is provided, cleaved with a DNase, and theresulting set of nucleic acid fragments are recombined with theplurality of family gene shuffling oligonucleotides. This combination ofmethods can be advantageous, because the DNase-cleavage fragments form a“scaffold” which can be reconstituted into a full length sequence-anadvantage in the event that one or more synthesized oligo in thesynthesized set is defective.

However, one advantage of the present invention is the ability torecombine several regions of diversity among homologous nucleic acids,even without the homologous nucleic acids, or cleaved fragments thereof,being present in the recombination mixture. Resulting shuffled nucleicacids can include regions of diversity from different nucleic acids,providing for the ability to combine different diversity domains in asingle nucleic acid. This provides a very powerful method of accessingnatural sequence diversity.

In general, the methods herein provide for “oligonucleotide mediatedshuffling” in which oligonucleotides corresponding to a family ofrelated homologous nucleic acids which are recombined to produceselectable nucleic acids. The technique can be used to recombinehomologous or even non-homologous nucleic acid sequences. Whenrecombining homologous nucleic acids, sets of overlapping family geneshuffling oligonucleotides (which are derived, e.g., by comparison ofhomologous nucleic acids and synthesis of oligonucleotide fragments) arehybridized and elongated (e.g., by reassembly PCR), providing apopulation of recombined nucleic acids, which can be selected for adesired trait or property. Typically, the set of overlapping familyshuffling gene oligonucleotides include a plurality of oligonucleotidemember types which have consensus region subsequences derived from aplurality of homologous target nucleic acids.

Typically, family gene shuffling oligonucleotide are provided byaligning homologous nucleic acid sequences to select conserved regionsof sequence identity and regions of sequence diversity. A plurality offamily gene shuffling oligonucleotides are synthesized (serially or inparallel) which correspond to at least one region of sequence diversity.

Sets of fragments, or subsets of fragments used in oligonucleotideshuffling approaches can be partially provided by cleaving one or morehomologous nucleic acids (e.g., with a DNase), as well as bysynthesizing a set of oligonucleotides corresponding to a plurality ofregions of at least one nucleic acid (typically oligonucleotidescorresponding to a partial or full-length nucleic acid are provided asmembers of the set of nucleic acid “fragments,” a term which encompassesboth cleavage fragments and synthesized oligonucleotides). In theshuffling procedures herein, these cleavage fragments can be used inconjunction with family gene shuffling oligonucleotides, e.g., in one ormore recombination reaction to produce recombinant nucleic acids.

The following provides details and examples regarding sequencealignment, oligonucleotide construction and library generation,shuffling procedures and other aspects of the present invention.

Aligning Homologous Nucleic Acid Sequences to Select Conserved Regionsof Sequence Identity and Regions of Sequence Diversity

In one aspect, the invention provides for alignment of nucleic acidsequences to determine regions of sequence identity or similarity andregions of diversity. The set of overlapping family shuffling geneoligonucleotides can comprise a plurality of oligonucleotide membertypes which comprise consensus region subsequences derived from aplurality of homologous target nucleic acids. These consensus regionsubsequences are determined by aligning homologous nucleic acids andidentifying regions of identity or similarity.

In one embodiment, homologous nucleic acid sequences are aligned, and atleast one conserved region of sequence identity and a plurality ofregions of sequence diversity are selected. The plurality of regions ofsequence diversity provide a plurality of domains of sequence diversity.Typically, a plurality of family gene shuffling oligonucleotidescorresponding to the plurality of domains of sequence diversity aresynthesized and used in the various recombination protocols noted hereinor which are otherwise available. Genes synthesized by theserecombination methods are optionally further screened or furtherdiversified by any available method, including recombination and/ormutagenesis.

Alignment of Homologous Nucleic Acids

Typically, the invention comprises first aligning identical nucleicacids, or regions of nucleic acid similarity, e.g., for sequencesavailable from any of the publicly available or proprietary nucleic aciddatabases. Public database/search services include Genbank(®, Entrez®,EMBL, DDBJ and those provided by the NCBI. Many additional sequencedatabases are available on the internet or on a contract basis from avariety of companies specializing in genomic information generationand/or storage.

The terms “identical” or percent “identity,” in the context of two ormore nucleic acid or polypeptide sequences, refer to two or moresequences or subsequences that are the same or have a specifiedpercentage of amino acid residues or nucleotides that are the same, whencompared and aligned for maximum correspondence, as measured using oneof the sequence comparison algorithms described below (or otheralgorithm available to persons of skill) or by visual inspection.

The phrase “substantially identical,” in the context of two nucleicacids or polypeptides refers to two or more sequences or subsequencesthat ha-.e at least about 50%, preferably 80%, most preferably 90-95%nucleotide or amino acid residue identity, when compared and aligned formaximum correspondence, as measured Using one of the following sequencecomparison algorithms or by visual inspection. Such “substantiallyidentical” sequences are typically considered to be homologous.

For sequence comparison and homology determination, typically onesequence acts as a reference sequence to which test sequences arecompared. When using a sequence comparison algorithm, test and referencesequences are input into a computer, subsequence coordinates aredesignated, if necessary, and sequence algorithm program parameters aredesignated. The sequence comparison algorithm then calculates thepercent sequence identity for the test sequence(s) relative to thereference sequence, based on the designated program parameters.

Optimal alignment of sequences for comparison can be conducted, e.g., bythe local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482(1981), by the homology alignment algorithm of Needleman & Wunsch, J.Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson& Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerizedimplementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA inthe Wisconsin Genetics Software Package, Genetics Computer Group, 575Science Dr., Madison, Wis.), or by visual inspection (see generally,Ausubel et al., infra).

One example algorithm that is suitable for determining percent sequenceidentity and sequence similarity is the BLAST algorithm, which isdescribed in Altschul et al., J. Mol. Biol. 215:403-410 (1990). Softwarefor performing BLAST analyses is publicly available through the NationalCenter for Biotechnology Information (http://vww.ncbi.nlm.nih.gov/).This algorithm involves first identifying high scoring sequence pairs(HSPs) by identifying short words of length W in the query sequence,which either match or satisfy some positive-valued threshold score Twhen aligned with a word of the same length in a database sequence. T isreferred to as the neighborhood word score threshold (Altschul et al.,supra). These initial neighborhood word hits act as seeds for initiatingsearches to find longer HSPs containing them. The word hits are thenextended in both directions along each sequence for as far as thecumulative alignment score can be increased. Cumulative scores arecalculated using, for nucleotide sequences, the parameters M (rewardscore for a pair of matching residues; always >0) and N (penalty scorefor mismatching residues; always <0). For amino acid sequences, ascoring matrix is used to calculate the cumulative score. Extension ofthe word hits in each direction are halted when: the cumulativealignment score falls off by the quantity X from its maximum achievedvalue; the cumulative score goes to zero or below, due to theaccumulation Of one or more negative-scoring residue alignments; or theend of either sequence is reached. The BLAST algorithm parameters W, T,and X determine the sensitivity and speed of the alignment. The BLASTNprogram (for nucleotide sequences) uses as defaults a wordlength (W) of11, an expectation (E) of 10, a cutoff of 100, M=5, N=−4, and acomparison of both strands. For amino acid sequences, the BLASTP programuses as defaults a wordlength (W) of 3, an expectation (E) of 10, andthe BLOSUM62 scoring matrix (see Henikoff & Henikoff (1989) Proc. Natl.Acad. Sci. USA 89:10915).

In addition to calculating percent sequence identity, the BLASTalgorithm also performs a statistical analysis of the similarity betweentwo sequences (see, e.g., Karlin & Altschul (1993) Proc. Nat'l. Acad.Sci. USA 90:5873-5787). One measure of similarity provided by the BLASTalgorithm is the smallest sum probability (P(N)), which provides anindication of the probability by which a match between two nucleotide oramino acid sequences would occur by chance. For example, a nucleic acidis considered similar to a reference sequence (and, therefore, likelyhomologous) if the smallest sum probability in a comparison of the testnucleic acid to the reference nucleic acid is less than about 0.1, morepreferably less than about 0.01, and most preferably less than about0.001. Other available sequence alignment programs include, e.g.,PILEUP.

A number of additional sequence alignment protocols can be found, e.g.,in “METHODS FOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES ANDPOLYPEPTIDES HAVING DESIRED CHARACTERISTICS” by Selifonov et al.,attorney docket number 02-289-3US, filed herewith.

Oligonucleotide Synthesis

In one aspect, the invention comprises synthesizing a plurality offamily gene shuffling oligonucleotides, e.g., corresponding to at leastone region of sequence diversity. Typically sets of family geneshuffling oligonucleotides are produced, e.g., by sequential or paralleloligonucleotide synthesis protocols.

Oligonucleotides, e.g., whether for use in in vitro amplification/genereconstruction/reassembly methods, or to provide sets of family geneshuffling oligonucleotides, are typically synthesized chemicallyaccording to the solid phase phosphoramidite triester method describedby Beaucage and Caruthers (1981), Tetrahedron Letts., 22(20):1859-1862,e.g., using an automated synthesizer, as described inNeedham-VanDevanter et al. (1984) Nucleic Acids Res., 12:6159-6168. Awide variety of equipment is commercially available for automatedoligonucleotide synthesis. Multi-nucleotide synthesis approaches (e.g.,tri-nucleotide synthesis), as discussed, supra, are also useful.

Moreover, essentially any nucleic acid can be custom ordered from any ofa variety of commercial sources, such as The Midland Certified ReagentCompany (mcrc@oligos.com), The Great American Gene Company(http://www.genco.com), ExpressGen Inc. (www.expressgen.com), OperonTechnologies Inc. (Alameda, Calif.) and many others.

Synthetic Library Assembly

Libraries of family gene shuffling oligonucleotides are provided. Forexample, homologous genes of interest are aligned using a sequencealignment program such as BLAST, as described above. Nucleotidescorresponding to amino acid variations between the homologs are noted.Oligos for synthetic gene shuffling are designed which comprise one (ormore) nucleotide difference to any of the aligned homologous sequences,i.e., oligos are designed that are identical to a first nucleic acid,but which incorporate a residue at a position which corresponds to aresidue of a nucleic acids homologous, but not identical to the firstnucleic acid.

Preferably, all of the oligonucleotides of a selected length (e.g.,about 20, 30, 40, 50, 60, 70, 80, 90, or 100 or more nucleotides) whichincorporate all possible nucleic acid variants are made. This includes Xoligonucleotides per X sequence variations, where X is the number ofdifferent sequences at a locus. The X oligonucleotides are largelyidentical idol sequence, except for the nucleotide(s) representing thevariant nucleotide(s). Because of this similarity, it can beadvantageous to utilize parallel or pooled synthesis strategies in whicha single synthesis reaction or set of reagents is used to make commonportions of each oligonucleotide. This can be performed e.g., bywell-known solid-phase nucleic acid synthesis techniques, or, e.g.,utilizing array-based oligonucleotide synthetic methods (see e.g., Fodoret al. (1991) Science, 251: 767-777; Fodor (1997) “Genes, Chips and theHuman Genome” FASEB Journal. 11:121-121; Fodor (1997) “MassivelyParallel Genomics” Science. 277:393-395; and Chee et al. (1996)“Accessing Genetic Information with High-Density DNA Arrays” Science274:610-614). Additional oligonucleotide synthetic strategies are found,e.g., in “METHODS FOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES ANDPOLYPEPTIDES HAVING DESIRED CHARACTERISTICS” by Selifonov et al.,attorney docket number 02-289-3US, filed herewith.

In one aspect, oligonucleotides are chosen so that only encoded aminoacid alterations are considered in the synthesis strategy. In thisstrategy, after aligning a family of homologous nucleic acids, familyshuffling oligos are synthesized to be degenerate only at thosepositions where a base change results in an alteration in an encodedpolypeptide sequence. This has the advantage of requiring fewerdegenerate oligonucleotides to achieve the same degree of diversity inencoded products, thereby simplifying the synthesis of the set of familygene shuffling oligonucleotides.

In synthesis strategies in general, the oligonucleotides have at leastabout 10 bases of sequence identity to either side of a region ofvariance to ensure reasonably efficient hybridization and assembly.However, flanking regions with identical bases can have fewer identicalbases (e.g., 5, 6, 7, 8, or 9) and can, of course, have larger regionsof identity (e.g., 11, 12, 13, 14, 15, 16, 17, 18, ,19, 20, 25, 30, 50,or more).

During gene assembly, oligonucleotides can be incubated together andreassembled using any of a variety of polymerase-mediated reassemblymethods, e.g., as described herein and as known to one of skill.Selected oligonucleotides can be “spiked” in the recombination mixtureat any selected concentration, thus causing preferential incorporationof desirable modifications.

For example, during oligonucleotide elongation, hybridizedoligonucleotides are incubated in the presence of a nucleic acidpolymerase, e.g., Taq, Klenow, or the like, and dNTP's (i.e., dATP,dCTP, dGTP and dTTP). If regions of sequence identity are large aq orother high-temperature polymerase can be used with a hybridizationtemperature of between about room temperature and, e.g., about 65° C. Ifthe areas of identity are small, Klenow, Taq or polymerases can be usedwith a hybridization temperature of below room temperature. Thepolymerase can be added to nucleic acid fragments (oligonucleotides plusany additional nucleic acids which form a recombination mixture) priorto, simultaneously with, or after hybridization of the oligonucleotidesand other recombination components. As noted elsewhere in thisdisclosure, certain embodiments of the invention can involve denaturingthe resulting elongated double-stranded nucleic acid sequences and thenhybridizing and elongating those sequences again. This cycle can berepeated for any desired number of times. The cycle is repeated e.g.,from about 2 to about 100 times.

Library Spiking

Family oligonucleotides can also be used to vary the nucleic acidspresent in a typical shuffling mixture; e.g., a mixture of DNasefragments of one or more gene(s) from a homologous set of genes. In oneaspect, all of the nucleic acid to be shuffled are aligned as describedabove. Amino acid variations are noted and/or marked (e.g., in anintegrated system comprising a computer running appropriate sequencealignment software, or manually, e.g., on a printout of the sequences orsequence alignments. See also, “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES AND POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., attorney docket number 02-289-3US, filed herewith). Asabove, family shuffling oligos are designed to incorporate some or allof the amino acid variations coded by the natural sequence diversity forthe aligned nucleic acids. One or more nucleic acids corresponding tothe homologous set of aligned nucleic acids are cleaved (e.g., using aDNase, or by chemical cleavage). Family shuffling oligos are spiked intothe mixture of cleaved nucleic acids, which are then recombined andreassembled into full-length sequences using standard techniques.

To determine the extent of oligonucleotide incorporation, any approachwhich distinguishes similar nucleic acids can be used. For example, thereassembled nucleic acids can be cloned and sequenced, or amplified (invitro or by cloning, e.g., into a standard cloning vector) and cleavedwith a restriction enzyme which specifically recognizes a particularpolymorphic sequence present in the family shuffling oligos, but notpresent in the same position in the original cleaved nucleic acid(s).

In another embodiment, oligonucleotides are selected which incorporateone or more sequence variation corresponding to an amino acidpolymorphism, but which eliminate polymorphic nucleotide variationsbetween nucleic acid sequences which correspond to silent substitutions.One advantage of this strategy is that the elimination of silentsubstitutions can make a given sequence more similar to a givensubstrate for recombination (e.g., a selected target nucleic acid). Thisincreased similarity permits nucleic acid recombination among sequenceswhich might otherwise be too diverse for efficient recombination.

For example, a selected nucleic acid can be PCR amplified using standardmethods. The selected nucleic acid is cleaved and mixed with a libraryof family gene shuffling oligonucleotides which are rendered as similaras possible to the corresponding sequences of the selected nucleic acidby making the oligonucleotides include the same silent substitution setfound in the selected nucleic acid. The oligonucleotides are spiked at aselected concentration into the cleavage mixture, which is thenreassembled into full-length sequences. The quality of the resultinglibrary (e.g., frequency at which the oligos are incorporated into thereassembled sequences) is checked, as noted above, by cloning (orotherwise amplifying) and sequencing and/or restriction digesting thereassembled sequences.

PCR elongation strategies can also be used to make libraries usingdifferent molar ratios of oligonucleotides in the recombination mixtures(see also, e.g., WO 97/20078, WO 98/42832 and WO 98/01581).

Iterative Oligonucleotide Formats

In one aspect, the present invention provides iterativeoligonucleotide-mediated recombination formats. These formats can becombined with standard recombination methods, also, optionally, in aniterative format.

In particular, recombinant nucleic acids produced byoligonucleotide-mediated recombination can be screened for activity andsequenced. The sequenced recombinant nucleic acids are aligned andregions of identity and diversity are identified. Family shufflingoligonucleotides are then selected for recombination of the sequencedrecombinant nucleic acids. This process of screening, sequencing activerecombinant nucleic acids and recombining the active recombinant nucleicacids can be iteratively repeated until a molecule with a desiredproperty is obtained.

In addition, recombinant nucleic acids made using family shufflingoligonucleotides can be cleaved and shuffled using standardrecombination methods, which are, optionally, reiterative. Standardrecombination can be used in conjunction with oligonucleotide shufflingand either or both steps are optionally reiteratively repeated.

One useful example of iterative shuffling by oligonucleotide mediatedrecombination of family oligonucleotides occurs when extremely finegrain shuffling is desired. For example, small genes encoding smallprotein such as defensins (antifungal proteins of about 50 amino acids)EF40 (an antifungal protein family of about 28 amino acids), peptideantibiotics, peptide insecticidal proteins, peptide hormones, manycytokines and many other small proteins, are difficult to recombine bystandard recombination methods, because the recombination often occurswith a frequency that is roughly the same as the size of the gene to berecombined, limiting the diversity resulting from recombination. Incontrast, oligonucleotide-mediated recombination methods can recombineessentially any region of diversity in any set of sequences, withrecombination events (e.g., crossovers) occurring at any selectedbase-pair.

Thus, libraries of sequences prepared by recursive oligonucleotidemediated recombination are optionally screened and selected for adesired property, and improved (or otherwise desirable) clones aresequenced (or otherwise deconvoluted, e.g., by real time PCR analysissuch as FRET or TaqMan, or using restriction enzyme analysis) with theprocess being iteratively repeated to generate additional libraries ofnucleic acids. Thus, additional recombination rounds are performedeither by standard fragmentation-based recombination methods, or bysequencing positive clones, designing appropriate family shufflingoligonucleotides and performing a second round ofrecombination/selection to produce an additional library (which can berecombined as described). In addition, libraries made from differentrecombination rounds can also be recombined, either bysequencing/oligonucleotide recombination or by standard recombinationmethods.

Crossover PCR Shuffling

In one aspect, the present invention provides for shuffling of distantlyrelated or even non-homologous sequences. In this embodiment, PCRcrossover oligonucleotides are designed with a first region derived froma first nucleic acid and a second region corresponding to a secondnucleic acid. Additional oligos are designed which correspond to eitherthe first or second nucleic acid, and which have sequences that arecomplementary (or identical) to the crossover oligos. By recombiningthese oligos (i.e., hybridizing them and then elongating the hybridizedoligonucleotides in successive polymerase-mediated elongationreactions), a substrate is provided which can recombine with either thefirst or second nucleic acid, and which will, at the same time,incorporate sequences from the other nucleic acid.

In Vivo Oligonucleotide Recombination Utilizing Family ShufflingChimerplasts

Chimeraplasts are synthetic RNA-DNA hybrid molecules which have beenused for “genetic surgery” in which one or a few bases in a genomic DNAare changed by recombination with the chimeric molecule. Thechimeraplasts are chimeric nucleic acids composed of contiguousstretches of RNA and DNA residues in a duplex conformation with doublehairpin caps on the ends of the molecules (Yoon et al. (1996) PNAS93:2071-2076). The RNA-DNA sequence is designed to align with thesequence of a locus to be altered by recombination with thechimeraplast, with the chimeraplast having the desired change in basesequence for the locus. The host cell repair machinery converts the hostcell sequence to that of the chimeraplast. For brief reviews of thetechnique see, Bartlett (1998) Nature Biotechnology 16:1312; Strauss(1998) Nature Medicine 4:274-275.

This strategy has been used for targeted correction of a point mutationin the gene for human liver/kidney/bone alkaline phosphatase encoded onan episomal DNA in mammalian cells (Yoon, id.). The strategy was alsoused for correction of the mutation responsible for sickle cell anemiain genomic DNA in lymphoblastoid cells (Cole-Strauss et al. (1996)Science 1386-1389). Alexeev and Yoon (1998) Nature Biotechnology1343-1346 describe the use of a hybrid RNA-DNA oligonucleotide (an“RDO”) to make a point correction in the mouse tyrosinase gene,resulting in correction of an albino mutation in mouse cells andproduction of black pigmentation by the cells. Kren et al. (1998) NatureMedicine 4(3):285-290 describe in vivo site-directed mutagenesis of thefactor IX gene by chimeric RNA/DNA oligonucleotides. Xiang et al (1997)J. Mol. Med. 75:829-835 describe targeted gene conversion in a mammalianCD34⁺-enriched cell population using a chimeric RNA-DNA oligonucleotide.Kren et al. (1997) Hepatology 25(6):1462-1468 describe targetednucleotide exchange in the alkaline phosphatase gene of Hu-H-7 cellsmediated by a chimeric RNA-DNA oligonucleotide.

In one aspect of the present invention, the family shufflingoligonucleotides are chimeraplasts. In this embodiment, family shufflingoligonucleotides are made as set forth herein, to additionally includestructural chimeraplast features. For example, in h references notedabove, DNA-RNA oligos are synthesized according to standardphosphoramidite coupling chemistries (the nucleotides utilizedoptionally include non-standard nucleotides such as 2-O methylated RNAnucleotides). The oligos have a “dual hairpin” structure (e.g., having aT loop at the ends of the structure) as set forth in the referencesnoted above.

The set of family shuffling chimeraplasts each Include regions ofidentity to a target gene of interest, and regions of diversitycorresponding to the diversity (i.e., the sequence variation for aparticular subsequence) found in the target gene of interest. As setforth in FIG. 1, the set of oligonucleotides is transduced into cells(e.g., plant cells), where the chimeraplasts recombine with a sequenceof interest in the genome of the cells, thereby creating a library ofcells with at least one region of diversity at a target gene ofinterest. The library is then screened and selected as described herein.Optionally, the selected library members are subjected to an additionalround of chimeraplast recombination with the same or different set ofchimeraplast oligonucleotides, followed by selection/screening assays asdescribed.

For example, chimeraplasts are synthesized with sequences whichcorrespond to regions of sequence diversity observed following analignment of homologous nucleic acids. That is, the chimeraplasts eachcontain one or a few nucleotides which, following incorporation of thechimeraplasts into one or more target sequences, results in conversionof a subsequence of a gene into a subsequence found in an homologousgene. By transducing a library of homologous chimeraplast sequences intoa population of cells, the target gene of interest within the cells isconverted at one or more positions to a sequence derived from one ormore homologous sequences. Thus, the effect of transducing the cellpopulation with the chimeraplast library is to create a library oftarget genes corresponding to the sequence diversity found in geneshomologous to the target sequence.

Chimeraplasts can also be similarly used to convert the target gene atselected positions with non-homologous sequence choices, e.g., wherestructural or other information suggests the desirability of such aconversion. In this embodiment, the chimeraplasts include sequencescorresponding to non-homologous sequence substitutions.

Optionally, the chimeraplasts, or a co-transfected DNA, can incorporatesequence tags, selectable markers, or other structural features topermit selection or recovery of cells in which the target gene hasrecombined with the chimeraplast. For example, a co-transfected DNA caninclude a marker such as drug resistance, or expression of a detectablemarker (e.g., Lac Z, or green fluorescent protein).

In addition, sequences in the chimeraplast can be used as purificationor amplification tags. For example, a portion of the chimeraplast can becomplementary to a PCR primer. In this embodiment, PCR primers are usedto synthesize recombinant genes from the cells of the library.Similarly, PCR primers can bracket regions of interest, includingregions in which recombination between a chimeraplast and a standard DNAoccurs. Other PCR, restriction enzyme digestion and/or cloningstrategies which result in the isolation of nucleic acids resulting fromrecombination between the chimeraplast can also be used to recover therecombined nucleic acid, which is optionally recombined with additionalnucleic acids. Reiterative cycles of chimeraplast-mediatedrecombination, recovery of recombinant nucleic acids and recombinationof the recovered nucleic acids can be performed using standardrecombination methods. Selection cycles can be performed after anyrecombination event to select for desirable nucleic acids, or,alternatively, several rounds of recombination can be performed prior toperforming a selection step.

Libraries of Chimerplasts and Other Gene Recombination Vehicles

As noted above, chimeraplasts are generally useful structures formodification of nucleotide sequences in target genes, in vivo.Accordingly, structures which optimize chimeraplast activity aredesirable. Thus, in addition to the use of chimeraplasts in in vitro andin vivo recombination formats as noted, the present invention alsoprovides for the optimization of chimeraplast activity in vitro and invivo, as well as for a number of related libraries and othercompositions.

In particular, a marker can be incorporated into a library of relatedchimeraplasts. The marker is placed between the ends of the chimeraplastin the region of the molecule which is incorporated into a targetnucleic acid following recombination between the chimeraplast and thetarget nucleic acid. For example, the marker can cause a detectablephenotypic effect in a cell in which recombination occurs, or the markercan simply lead to a change in the target sequence which can be detectedby standard nucleic acid sequence detection techniques (e.g., PCRamplification of the sequence or of a flanking sequence, LCR,restriction enzyme digestion of a sequence created by a recombinationevent, binding of the recombined nucleic acid to an array (e.g., a genechip), and/or sequencing of the recombined nucleic acid, etc.).Ordinarily, the regions of sequence difference are determined to providean indication of which sequences have increased recombination rates.

The library of related chimeraplasts includes chimeraplasts with regionsof sequence divergence in the T loop hairpin regions and in the regionbetween the T loop hairpin region flanking the marker. This divergencecan be produced by synthetic strategies which provide for production ofheterologous sequences as described herein.

For example, synthetic strategies utilizing chimeraplasts which arelargely identical in sequence, except for variant nucleotide(s) areproduced to simplify synthetic strategies. Because of this similarity,parallel or pooled synthesis strategies can be used in which a singlesynthesis reaction or set of reagents is used to make common portions ofeach oligonucleotide. This can be performed e.g., by well-knownsolid-phase nucleic acid synthesis techniques, e.g., in a commerciallyavailable oligonucleotide synthesizer, or, e.g., by utilizingarray-based oligonucleotide synthetic methods (see e.g., Fodor et al.(1991) Science, 251: 767-777; Fodor (1997) “Genes, Chips and the HumanGenome” FASEB Journal. 11:121-121; Fodor (1997) “Massively ParallelGenomics” Science. 277:393-395; and Chee et al. (1996) “AccessingGenetic Information with High-Density DNA Arrays” Science 274:610-614).Accordingly, one feature of the present invention is a library ofchimeraplasts produced by these methods, i.e., a library ofchimeraplasts which share common sequence elements, including e.g., acommon marker, as well as regions of difference, e.g., differentsequences in the hairpin regions of the molecule.

The library which is produced by these methods is screened for increasedrecombination rates as noted above. Library members which are identifiedas having increased rates of recombination are optionally themselvesrecombined to produce libraries of recombined chimeraplasts.Recombination is ordinarily performed by assessing the sequences of themembers which initially display increased recombination rates, followedby synthesis of chimeraplasts which display structural similarity to atleast two of these members. This process can be iteratively repeated tocreate new “recombinant” chimeraplasts with increased recombinationactivity, as well as libraries of such chimeraplasts.

Other recombination molecules can similarly be produced by thesemethods. For example, Cre-Lox sites, Chi sites and other recombinationfacilitating sequences in cell transduction/transformation vectors arevaried and selected in the same manner as noted above. Where thesequences are simple DNA sequences, they can be recombined either by thesynthetic methods noted herein, and/or by standard DNA shufflingmethods.

Codon-varied Oligonucleotide

Codon-varied oligonucleotides are oligonucleotides, similar in sequencebut with one or more base variations, where the variations correspond toat least one encoded amino acid difference. They can be synthesizedutilizing tri-nucleotide, i.e., codon-based phosphoramidite couplingchemistry, in which tri-nucleotide phosphoramidites representing codonsfor all 20 amino acids are used to introduce entire codons intooligonucleotide sequences synthesized by this solid-phase technique.Preferably, all of the oligonucleotides of a selected length (e.g.,about 20, 30, 40, 50, 60, 70, 80, 90, or 100 or more nucleotides) whichincorporate the chosen nucleic acid sequences are synthesized. In thepresent invention, codon-varied oligonucleotide sequences can be basedupon sequences from a selected set of homologous nucleic acids.

The synthesis of tri-nucleotide phoshoramidites, their subsequent use inoligonucleotide synthesis, and related issues are described in, e.g.,Virnekäs, B., et al., (1994) Nucleic Acids Res., 22, 5600-5607,Kayushin, A. L. et al., (1996) Nucleic Acids Res., 24, 3748-3755, Huse,U.S. Pat. No. 5,264,563 “PROCESS FOR SYNTHESIZING OLIGONUCLEOTIDES WITHRANDOM CODONS”, Lyttle et al., U.S. Pat. No. 5,7178,085 “PROCESS FORPREPARING CODON AMIDITES”, Shortle et al., U.S. Pat. No. 5,869,644“SYNTHESIS OF DIVERSE AND USEFUL COLLECTIONS OF OLIGONUCLEOTIDES”;Greyson, U.S. Pat. No. 5,789,577 “METHOD FOR THE CONTROLLED SYNTHESIS OFPOLYNUCLEOTIDE MIXTURES WHICH ENCODE DESIRED MIXTURES OF PEPTIDES”; andHuse, WO 92/06176 “SURFACE EXPRESSION LIBRARIES OF RANDOMIZED PEPTIDES”.

Codon-varied oligonucleotides can be synthesized using varioustrinucleotide-related techniques, e.g., the trinucleotide synthesisformat and the split-pool synthesis format. The chemistry involved inboth the trinucleotide and the split-pool codon-varied oligonucleotidesynthetic methods is well known to those of skill. In general, bothmethods utilize phosphoramidite solid-phase chemical synthesis in whichthe 3′ ends of nucleic acid substrate sequences are covalently attachedto a solid support, e.g., control pore glass. The 5′ protecting groupscan be, e.g., a triphenylmethyl group, such as dimethoxyltritlyl (DMT)or monomethyoxytrityl; a carbonyl-containing group, such as9-fluorenylmethyloxycarbonyl (FMOC) or levulinoyl; an acid-clearablegroup, such as pixyl; a fluoride-cleavable alkylsilyl group, such astert-butyl dimethylsilyl (T-BDMSi), triisopropyl silyl, ortrimethylsilyl. The 3′ protecting groups can be, e.g., β-cyanoethylgroups

The trinucleotide synthesis format includes providing a substratesequence having a 5′ terminus and at least one base, both of which haveprotecting groups thereon. The 5′ protecting group of the substratesequence is then removed to provide a 5′ deprotected substrate sequence,which is then coupled with a selected trinucleotide phosphoramiditesequence. The trinucleotide has a 3′ terminus, a 5′ terminus, and threebases, each of which has protecting groups thereon. The coupling stepyields an extended oligonucleotide sequence. Thereafter, the removingand coupling steps are optionally repeated. When these steps arerepeated, the extended oligonucleotide sequence yielded by each repeatedcoupling step becomes the substrate sequence of the next repeatedremoving step until a desired codon-varied oligonucleotide is obtained.This basic synthesis format can optionally include coupling together oneor more of: mononucleotides, trinucleotide phosphoramidite sequences,and oligonucleotides.

The split-pool synthesis format includes providing substrate sequences,each having a 5′ terminus and at least one base, both of which haveprotecting groups hereon. The 5′ protecting groups of the substratesequences are removed to provide 5′ deprotected substrate sequences,which are then coupled with selected trinucleotide phosphoramiditesequences. Each trinucleotide has a 3′ terminus, a 5′ terminus, andthree bases, all of which have protecting groups thereon. The couplingstep yields extended oligonucleotide sequences. Thereafter, the removingand coupling steps are optionally repeated. When these steps arerepeated, the extended oligonucleotide sequences yielded by eachrepeated coupling step become the substrate sequences of the nextrepeated removing step until extended intermediate oligonucleotidesequences are produced.

Additional steps of the split-pool format optionally include splittingthe extended intermediate oligonucleotide sequences into two or moreseparate pools. After this is done, the 5′ protecting groups of theextended intermediate oligonucleotide sequences are removed to provide5′ deprotected extended intermediate oligonucleotide sequences in thetwo or more separate pools. Following this, these 5′ deprotectedintermediates are coupled with one or more selected mononucleotides,trinucleotide phosphoramidite sequences, or oligonucleotides in the twoor more separate pools to yield further extended intermediateoligonucleotide sequences. In turn, these further extended sequences arepooled into a single pool. Thereafter, the steps beginning with theremoval of the 5′ protecting groups of the substrate sequences toprovide 5′ deprotected substrate sequences are optionally repeated. Whenthese steps are repeated, the further extended oligonucleotidesequences, yielded by each repeated coupling step that generates thosespecific sequences, become the substrate sequences of the next repeatedremoving step that includes those specific sequences until desiredcodon-varied oligonucleotides are obtained.

Both synthetic protocols described, supra, can optionally be performedin an automated synthesizer that automatically performs the steps. Thisaspect includes inputting character string information into a computer,the output of which then directs the automated synthesizer to performthe steps necessary to synthesize the desired codon-variedoligonucleotides.

Further details regarding tri-nucleotide synthesis are found “USE OFCODON VARIED OLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLING” by Welchet al., U.S. Ser. No. 09/408,393, filed Sep. 28, 1999.

Tuning Nucleic Acid Recombination Using Oligonucleotide-MediatedBlending

In one aspect, non-equimolar ratios of family shuffling oligonucleotidesare used to bias recombination during the procedures noted herein. Inthis approach, equimolar ratios of family shuffling oligonucleotides ina set of family shuffling oligonucleotides are not used to produce alibrary of recombinant nucleic acids, as in certain other methodsherein. Instead, ratios of particular oligonucleotides which correspondto the sequences of a selected member or selected set of members of thefamily of nucleic acids from which the family shuffling oligonucleotidesare derived are selected by the practitioner.

Thus, in one simple illustrative example, oligonucleotide mediatedrecombination as described herein is used to recombine, e.g., a froggene and a human gene which are 50% identical. Family oligonucleotidesare synthesized which encode both the human and the frog sequences atall polymorphic positions. However, rather than using an equimolar ratioof the human and frog derived oligonucleotides, the ratio is biased infavor of the gene that the user wishes to emulate most closely. Forexample, when generating a human-like gene, the ratio ofoligonucleotides which correspond to the human sequence at polymorphicpositions can be biased to greater than 50% (e.g., about 60%, 70%, 80%,or 90% or more of the oligos can correspond to the human sequence, with,e.g., about 40%, 30%, 20%, 10%, or less of the oligos corresponding tothe frog sequence). Similarly, if one wants a frog-like gene, the ratioof oligonucleotides which correspond to the frog sequence at polymorphicpositions can be biased to greater than 50%. In either case, theresulting “blended” gene (i.e., the resulting recombinant gene withcharacteristics of more than one parent gene) can then be recombinedwith gene family members which are closely related by sequence to theblended gene. Thus, in the case above, in the case where the ratio ofoligonucleotides is selected to produce a more human-like blended gene,the blended gene is optionally further recombined with genes moreclosely similar to the original human gene. Similarly, where the ratioof oligonucleotides is selected to produce a more Frog-like blendedgene, the blended gene is optionally further recombined with genes moreclosely similar to the original frog gene. This strategy is set out inFIG. 2. The strategy is generally applicable to the recombination of anytwo or more nucleic acids by oligonucleotide mediated recombination.

Biasing can be accomplished in a variety of ways, including synthesizingdisproportionate amounts of the relevant oligonucleotides, or simplysupplying disproportionate amounts to the relevant gene synthesis method(e.g., to a PCR synthetic method as noted, supra).

As noted, this biasing approach can be applied to the recombination ofany set of two or more related nucleic acids. Sequences do not have tobe closely similar for selection to proceed. In fact, sequences do noteven have to be detectably homologous for biasing to occur. In thiscase, “family” oligonucleotides are substituted for non-sequencehomologous sets of oligonucleotides derived from consideration ofstructural similarity of the encoded proteins. For example, theimmunoglobulin superfamily includes structurally similar members whichdisplay little or no detectable sequence homology (especially at thenucleic acid level). In these cases, non-homologous sequences are“aligned” by considering structural homology (e.g., by alignment offunctionally similar peptide residues). A recombination space ofinterest can be defined which includes all permutations of the aminoacid diversity represented by the alignment. The above biasing method isoptionally used to blend the sequences with desired ratios of thenucleotides encoding relevant structurally similar amino acid sequences.

Any two or more sequences can be aligned by any algorithm or criteria ofinterest and the biasing method used to blend the sequences based uponany desired criteria. These include sequence homology, structuralsimilarity, predicted structural similarity (based upon any similaritycriteria which are specified), or the like. It can be applied tosituations in which there is a structural core that is constant, buthaving many structural variations built around the core (for example, anIg domain can be a structural core having many different loop lengthsand conformations being attached to the core).

A general advantage to this approach as compared to standard generecombination methods is that the overall sequence identity of twosequences to be blended can be lower than the identity necessary forrecombination to occur by more standard methods. In addition, sometimesonly selected regions are recombined, making it possible to take anystructural or functional data which is available into account inspecifying how the blended gene is constructed. Thus, sequence spacewhich is not produced by some other shuffling protocols is accessed bythe blended gene approach and a higher percentage of active clones cansometimes be obtained if structural information is taken intoconsideration. Further details regarding consideration of structuralinformation is found in “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES AND POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., attorney docket number 02-289-3US.

The general strategy above is applicable, e.g., to any set of genes withlow sequence similarity. For example, there is a large family of TNFhomologues whose sequence identity is in the range of about 30%, makingstandard shuffling protocols difficult to achieve. Of course, tuningrecombination by selecting oligonucleotide proportions is also generallyapplicable to recombination of any two nucleic acids, including bothhigh similarity homologues and low similarity homologues. Any alignmentprotocol can be selected to align two or more sequences and theresulting alignment can be used to create appropriate oligonucleotidesto achieve recombination, and any biasing in the relative frequencies ofsequences as compared to parental sequences can be achieved.

Targets for Oligonucleotide Shuffling

Essentially any nucleic acid can be shuffled by the oligonucleotidemediate methods herein. No attempt is made to identify the hundreds ofthousands of known nucleic acids. As noted above, common sequencerepositories for known proteins include GenBank EMBL, DDBJ and the NCBI.Other repositories can easily be identified by searching the internet.

One class of preferred targets for activation includes nucleic acidsencoding therapeutic proteins such as erythropoietin (EPO), insulin,peptide hormones such as human growth hormone; growth factors andcytokines such as epithelial Neutrophil Activating Peptide-78,GROα/MGSA, GROβ, GROγ, MIP-1α, MIP-16, MCP-1, epidermal growth factor,fibroblast growth factor, hepatocyte growth factor, insulin-like growthfactor, the interferons, the interleukins, keratinocyte growth factor,leukemia inhibitory factor, oncostatin M, PD-ECSF, PDGF, pleiotropin,SCF, c-kit ligand, VEGEF, G-CSF etc. Many of these proteins arecommercially available (See, e.g., the Sigma BioSciences 1997 catalogueand price list), and the corresponding genes are well-known.

Another class of preferred targets are transcriptional and expressionactivators. Example transcriptional and expression activators includegenes and proteins that modulate cell growth, differentiation,regulation, or the like. Expression and transcriptional activator:s arefound in prokaryotes, viruses, and eukaryotes, including fungi, plants,and animals, including mammals, providing a wide range of therapeutictargets. It will be appreciated that expression and transcriptionalactivators regulate transcription by many mechanisms, e.g., by bindingto receptors, stimulating a signal transduction cascade, regulatingexpression of transcription factors, binding to promoters and enhancers,binding to proteins that bind to promoters and enhancers, unwinding DNA,splicing pre-mRNA, polyadenylating RNA, and degrading RNA. Expressionactivators include cytokines, inflammatory molecules, growth factors,their receptors, and oncogene products, e.g., interleukins (e.g., IL-1,IL-2, IL-8, etc.), interferons, FGF, IGF-I, IGF-II, FGF, PDGF, TNF,TGF-α, TGF-β, EGF, KGF, SCF/c-Kit, CD40L/CD40, VLA-4/VCAM-1,ICAM-1/LFA-1, and hyalurin/CD44; signal transduction, molecules andcorresponding oncogene products, e.g., Mos, Ras, Raf, and Met; andtranscriptional activators and suppressors, e.g., p53, Tat, Fos, Myc,Jun, Myb, Rel, and steroid hormone receptors such as those for estrogen,progesterone, testosterone, aldosterone, the LDL receptor ligand andcorticosterone.

Rnases such as Onconase and EDN are preferred targets for the syntheticmethods herein, particularly those methods utilizing gene blending. Oneof skill will appreciate that both frog and human RNAses are known andare known to have a number of important pharmacological activities.Because of the evolutionary divergence between these genes,oligonucleotide-mediated recombination methods are particularly usefulin recombining the nucleic acids.

Similarly, proteins from infectious organisms for possible vaccineapplications, described in more detail below, including infectiousfungi, e.g., Aspergillus, Candida species; bacteria, particularly E.coli, which serves a model for pathogenic bacteria, as well as medicallyimportant bacteria such as Staphylococci (e.g., aurezis), Streptococci(e.g., pneumoniae), Clostridia (e.g., perfringens), Neisseria (e.g.,gonorrhoea), Enterobacteriaceae (e.g., coli), Helicobacter (e.g.,pylori), Vibrio (e.g., cholerae), Campylobacter (e.g., jejuni),Pseudomonas (e.g., aeruginosa), Haemophilus (e.g., influenza),Bordetella (e.g., pertussis), Mycoplasma (e.g., pneumoniae), Ureaplasma(e.g., urealyticum), Legionella (e.g., pneumophilia), Spirochetes (e.g.Treponema, Leptospira, and Borrelia), Mycobacteria (e.g., tuberculosis,smegmatis), Actinomyces (e.g. israelii), Nocardia (e.g., asteroides),Chlamydia (e.g., trachomatis), Rickettsia, Coxiella, Ehrilichia,Rocholinmaea, Brucella, Yersinia, Francisella, and Pasteurella; protozoasuch as sporozoa (e.g., Plasmodia), rhizopods (e.g., Entamoeba) andflagellates (Trypanosoma, Leishmania, Trichomonas, Giardia, etc.);viruses such as (+) RNA viruses (examples include Poxviruses e.g.,vaccinia; Picornaviruses, e.g. polio; Togaviruses, e.g., rubella;Flaviviruses. e.g., HCV; and Coronaviruses), (−) RNA viruses (examplesinclude Rhabdoviruses, e.g., VSV; Paramyxovimses, e.g., RSV;Orthomyxovimses, e.g., influenza; Bunyaviruses; and Arenaviruses), dsDNAviruses (Reoviruses, for example), RNA to DNA viruses. i.e.,Retroviruses, e.g., especially HIV and HTLV, and certain DNA to RNAviruses such as Hepatitis B virus.

Other proteins relevant to non-medical uses, such as inhibitors oftranscription or toxins of crop pests e.g., insects, fungi, weed plants,and the like, are also preferred targets for oligonucleotide shuffling.Industrially important enzymes such as monooxygenases (e.g., p450s),proteases, nucleases, and lipases are also preferred targets. As anexample, subtilisin can be evolved by shuffling family oligonucleotidesfor homologous forms of the gene for subtilisin. Von der Osten et al.,J. Biotechnol. 28:55-68 (1993) provide an example subtilisin codingnucleic acids and additional nucleic acids are present in GENBANK®.Proteins which aid in folding such as the chaperonins are also preferredtargets.

Preferred known genes suitable for oligonucleotide mediated shufflingalso include the following: Alpha-1 antitrypsin, Angiostatin,Antihemolytic factor, Apolipoprotein, Apoprotein, Atrial natriureticfactor, Atrial natriuretic polypeptide, Atrial peptides, C—X—Cchemokines (e.g., T39765, NAP-2, ENA-78, Gro-a, Gro-b, Gro-c, IP-10,GCP-2, NAP-4, SDF-1, PF4, MIG), Calcitonin, CC chemokines (e.g.,Monocyte chemoattractant protein-1, Monocyte chemoattractant protein-2,Monocyte chemoattractant protein-3, Monocyte inflammatory protein-1alpha, Monocyte inflammatory protein-1 beta, RANTES, I309, R83915,R91733, HCC1, T58847, D31065, T64262), CD40 ligand, Collagen, Colonystimulating factor (CSF), Complement factor 5a, Complement inhibitor,Complement receptor 1, Factor IX, Factor VII, Factor VIII, Factor X,Fibrinogen, Fibronectin, Glucocerebrosidase, Gonadotropin, Hedgehogproteins (e.g., Sonic, Indian, Desert), Hemoglobin (for bloodsubstitute; for radiosensitization), Hirudin, Human serum albumin,Lactoferrin, Luciferase, Neurturin, Neutrophil inhibitory factor (NIF),Osteogenic protein, Parathyroid hormone, Protein A, Protein G, Relaxin,Renin, Salmon calcitonin, Salmon growth hormone, Soluble complementreceptor I, Soluble I-CAM 1, Soluble interleukin receptors (IL-1, 2, 3,4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15), Soluble TNF receptor,Somatomedin, Somatostatin, Somatotropin, Streptokinase, Superantigens,i.e., Staphylococcal enterotoxins (SEA, SEB, SEC1, SEC2, SEC3, SED,SEE), Toxic shock syndrome toxin (TSST-1), Exfoliating toxins A and B,Pyrogenic exotoxins A, B, and C, and M, arthritides mitogen, Superoxidedismutase, Thymosin alpha 1, Tissue plasminogen activator, Tumornecrosis factor beta (TNF beta), Tumor necrosis factor receptor (TNFR),Tumor necrosis factor-alpha (TNF alpha) and Urokinase.

Small proteins such as defensins (antifungal proteins of about 50 aminoacids, EF40 (an anti fungal protein of 28 amino acids), peptideantibiotics, and peptide insecticidal proteins are also preferredtargets and exist as families of related proteins. Nucleic acidsencoding small proteins are particularly preferred targets, becauseconventional recombination methods provide only limited product sequencediversity. This is because conventional recombination methodologyproduces crossovers between homologous sequences about every 50-100 basepairs. This means that for very short recombination targets, crossoversoccur by standard techniques about once per molecule. In contrast, theoligonucleotide shuffling formats herein provide for recombination ofsmall nucleic acids, as the practitioner selects any “cross-over”desired.

Additional preferred targets are described in “METHODS FOR MAKINGCHARACTER STRINGS, POLYNUCLEOTIDES AND POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov et al., attorney docket number 02-289-3USand other references herein.

Dna Shuffling and Gene Reassembly—Hybrid Synthetic Shuffling Methods

One aspect of the present invention is the ability to use familyshuffling oligonucleotides and cross over oligonucleotides asrecombination templates/intermediates in various DNA shuffling methods.In addition, nucleic acids made by the new synthetic techniques hereincan be reshuffled by other available shuffling methodologies.

A variety of such methods are known, including those taught by theinventors and their coworkers. The following publications describe avariety of recursive recombination procedures and/or related methodswhich can be practiced in conjunction with the processes of theinvention: Stemmer, et al., (1999) “Molecular breeding of viruses fortargeting and other clinical properties. Tumor Targeting” 4:1-4; Nessetal. (1999) “DNA Shuffling of subgenomic sequences of subtilisin” NatureBiotechnology 17:893-896; Chang et al. (1999) “Evolution of a cytokineusing DNA family shuffling” Nature Biotechnology 17:793-797; Minshulland Stemmer (1999) “Protein evolution by molecular breeding” CurrentOpinion in Chemical Biology 3:284-290; Christians et al. (1999)“Directed evolution of thymidine kinase for AZT phosphorylation usingDNA family shuffling” Nature Biotechnology 17:259-264; Crameriet al.(1998) “DNA shuffling of a family of genes from diverse speciesaccelerates directed evolution” Nature 391:288-291; Crameri et al.(1997) “Molecular evolution of an arsenate detoxification pathway by DNAshuffling,” Nature Biotechnology 15:436-438; Zhang et al. (1997)“Directed evolution of an effective fucosidase from a galactosidase byDNA shuffling and screening” Proceedings of the National Academy ofSciences, U.S.A. 94:4504-4509; Patten et al. (1997) “Applications of DNAShuffling to Pharmaceuticals and Vaccines” Current Opinion inBiotechnology 8:724-733; Crameri et al. (1996) “Construction andevolution of antibody-phage libraries by DNA shuffling” Nature Medicine2:100-103; Crameri et al. (1996) “Improved green fluorescent protein bymolecular evolution using DNA shuffling” Nature Biotechnology14:315-319; Gates et al. (1996) “Affinity selective isolation of ligandsfrom peptide libraries through display on a lac repressor ‘headpiecedimer’” Journal of Molecular Biology 255:373-386; Stemmer (1996) “SexualPCR and Assembly PCR” In: The Encyclopedia of Molecular Biology. VCHPublishers, New York. pp. 447-457; Crameri and Stemmer (1995)“Combinatorial multiple cassette mutagenesis creates all thepermutations of mutant and wildtype cassettes” BioTechniques 18:194-195;Stemmer et al., (1995) “Single-step assembly of a gene and entireplasmid form large numbers of oligodeoxyribonucleotides” Gene,164:49-53; Stemmer (1995) “The Evolution of Molecular Computation”Science 270: 1510; Stemmer (1995) “Searching Sequence Space”Bio/Technology 13:549-553; Stemmer (1994) “Rapid evolution of a proteinin vitro by DNA shuffling” Nature 370:389-391; and Stemmer (1994) “DNAshuffling by random fragmentation and reassembly: In vitro recombinationfor molecular evolution.” Proceedings of the National Academy ofSciences, U.S.A. 91:10747-10751.

Additional details regarding DNA shuffling methods are found in U.S.Patents by the inventors and their co-workers, including: U.S. Pat. No.5,605,793 to Stemmer (Feb. 25, 1997), “METHODS FOR IN VITRORECOMBINATION;” U.S. Pat. No. 5,811,238 to Stemmer et al. (Sep. 22,1998) “METHODS FOR GENERATING POLYNUCLEOTIDES HAVING DESIREDCHARACTERISTICS BY ITERATIVE SELECTION AND RECOMBINATION;” U.S. Pat. No.5,830,721 to Stemmer et al. (Nov. 3, 1998), “DNA MUTAGENESIS BY RANDOMFRAGMENTATION AND REASSEMBLY;” U.S. Pat. No. 5,834,252 to Stemmer, etal. (Nov. 10, 1998) “END-COMPLEMENTARY POLYMERASE REACTION,” and U.S.Pat. No. 5,837,458 to Minshull, et al. (Nov. 17, 1998), “METHODS ANDCOMPOSITIONS FOR CELLULAR AND METABOLIC ENGINEERING.”

In addition, details and formats for nucleic acid shuffling are found ina variety of PCT and foreign patent application publications, including:Stemmer and Crameri, “DNA MUTAGENESIS BY RANDOM FRAGMENTATION ANDREASSEMBLY” WO 95/22625; Stemmer and Lipschutz “END COMPLEMENTARYPOLYMERASE CHAIN REACTION” WO 96/33207; Stemmer and Crameri “METHODS FORGENERATING POLYNUCLEOTIDES HAVING DESIRED CHARACTERISTICS BY ITERATIVESELECTION AND RECOMBINATION” WO 97/0078; Minshul and Stemmer, “METHODSAND COMPOSITIONS FOR CELLULAR AND METABOLIC ENGINEERING” WO 97/35966;Punnonen et al. “TARGETING OF GENETIC VACCINE VECTORS” WO 99/41402;Punnonen et al. “ANTIGEN LIBRARY IMMUNIZATION” WO 99/41383; Punnonen etal. “GENETIC VACCINE VECTOR ENGINEERING” WO 99/41369; Punnonen et al.OPTIMIZATION OF IMMUNOMODULATORY PROPERTIES OF GENETIC VACCINES WO9941368; Stemmer and Crameri, “DNA MUTAGENESIS BY RANDOM FRAGMENTATIONAND REASSEMBLY” EP 0934999; Stemmer “EVOLVING CELLULAR DNA UPTAKE BYRECURSIVE SEQUENCE RECOMBINATION” EP 0932670; Stemmer et al.,“MODIFICATION OF VIRUS TROPISM AND HOST RANGE BY VIRAL GENOME SHUFFLING”WO 9923107; Apt et al., “HUMAN PAPILLOMAVIRUS VECTORS” WO 9921979; DelCardayre et al. “EVOLUTION OF WHOLE CELLS AND ORGANISMS BY RECURSIVESEQUENCE RECOMBINATION” WO 9831837; Patten and Stemmer, “METHODS ANDCOMPOSITIONS FOR POLYPEPTIDE ENGINEERING” WO 9827230; Stemmer et al.,and “METHODS FOR OPTIMIZATION OF GENE THERAPY BY RECURSIVE SEQUENCESHUFFLING AND SELECTION” WO9813487.

Certain U.S. Applications provide additional details regarding DNAshuffling and related techniques, including “SHUFFLING OF CODON ALTEREDGENES” by Patten et al. filed Sep. 29, 1998, (U.S. Ser. No. 60/102,362),Jan. 29, 1999 (U.S. Ser. No. 60/117,729), and Sep. 28, 1999, U.S. Ser.No. PCT/US99/22588; “EVOLUTION OF WHOLE CELLS AND ORGANISMS BY RECURSIVESEQUENCE RECOMBINATION”, by del Cardyre et al. filed Jul. 15, 1998 (U.S.Ser. No. 09/166,188), and Jul. 15, 1999 (U.S. Ser. No. 09/354,922);“OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION” by Crameri et al.,filed Feb. 5, 1999 (U.S. Ser. No. 60/118,813) and filed Jun. 24, 1999(U.S. Ser. No. 60/141,049) and filed Sep. 28, 1999 (U.S. Ser. No.09/408,392), and “USE OF CODON-BASED OLIGONUCLEOTIDE SYNTHESIS FORSYNTHETIC SHUFFLING” by Welch et al., filed Sep. 28, 1999 (U.S. Ser. No.09/408,393). Finally, the applications cited above in the sectionentitled “Cross Reference to Related Applications” provide relevantformats.

The foregoing references also provide additional details on the processof hybridizing and elongating nucleic acids to achieve nucleic acidrecombination.

In one aspect, a hybrid method which uses family gene shuffling incombination with more traditional recombination based shuffling methodsis used. For example, an active nucleic acid can be reassembled fromoligonucleotides to have a few or even no homologous substitutionsrelative to a given target gene. The reassembled “backbone” nucleic acidis treated with DNase as in standard methods, and the resulting DNasedfragments are spiked with family oligonucleotides comprising sequencescorresponding to regions of sequence identity and diversity in a givennucleic acid. The nucleic acids are then reassembled into a library ofhomologous sequences by the methods below (e.g., PCR reassembly, orother reassembly methods). This procedure can result in an increase inthe percentage of active clones which are found as compared tooligonucleotides synthetic methods which do not incorporate the use of-abackbone nucleic acid.

A number of the publications of the inventors and their co-workers, aswell as other investigators in the art describe techniques whichfacilitate DNA shuffling, e.g., by providing for reassembly of genesfrom small fragments, including oligonucleotides, as relevant to thepresent invention. For example, Stemmer et al. (1998) U.S. Pat. No.5,834,252 END COMPLEMENTARY POLYMERASE REACTION describe processes foramplifying and detecting a target sequence (e.g., in a mixture ofnucleic acids), as well as for assembling large polynucleotides fromfragments. Crameri et al. (1998) Nature 391: 288-291 provides basicmethodologies for gene reassembly, as does Crameri et al. (1998) Biotechniques 18(2): 194-196.

Other diversity generating approaches can also be used to modify nucleicacids produced by the methods herein, or to be used as templates for themethods herein. For example, additional diversity can be introduced bymethods which result in the alteration of individual nucleotides orgroups of contiguous or non-contiguous nucleotides, i.e., mutagenesismethods. Mutagenesis methods include, for example, recombination(PCT/US98/05223; Publ. No. WO98/42727); oligonucleotide-directedmutagenesis (for review see, Smith, Ann. Rev. Genet. 19: 423-462 (1985);Botstein and Shortle, Science 229: 1193-1201 (1985); Carter, Biochem. J.237: 1-7 (1986); Kunkel, “The efficiency of oligonucleotide directedmutagenesis” in Nucleic acids & Molecular Biology, Eckstein and Lilley,eds., Springer Verlag, Berlin (1987)). Included among these methods areoligonucleotide-directed mutagenesis (Zoller and Smith, Nucl. Acids Res.10: 6487-6500 (1982), Methods in Enzymol. 100: 468-500 (1983), andMethods in Enzymol. 154: 329-350 (1987)) phosphothioate-modified DNAmutagenesis (Taylor et al., Nucl. Acids Res. 13: 8749-8764 (1985);Taylor et al., Nucl. Acids Res. 13: 8765-8787 (1985); Nakamaye andEckstein, Nucl. Acids Res. 14: 9679-9698 (1986); Sayers et al., Nucl.Acids Res. 16:791-802 (1988); Sayers et al., Nucl. Acids Res. 16:803-814 (1988)), mutagenesis using uracil-containing templates (Kunkel,Proc. Nat'l. Acad. Sci. USA 82: 488-492 (1985) and Kunkel et al.,Methods in Enzymol. 154:367-382)); mutagenesis using gapped duplex DNA(Kramer et al., Nucl. Acids Res. 12: 9441-9456 (1984); Kramer and Fritz,Methods in Enzymol. 154:350-367 (1987); Kramer et al., Nucl. Acids Res.16: 7207 (1988)); and Fritz et al., Nucl. Acids Res. 16: 6987-6999(1988)). Additional methods include point mismatch repair (Kramer etal., Cell 38: 879-887 (1984)), mutagenesis using repair-deficient hoststrains (Carter et al., Nucl. Acids Res. 13: 4431-4443 (1985); Carter,Methods in Enzymol. 154: 382-403 (1987)), deletion mutagenesis(Eghtedarzadeh and Henikoff, Nucl. Acids Res. 14: 5115 (1986)),restriction-selection and restriction-purification (Wells et al., Phil.Trans. R. Soc. Lond. A 317: 415-423 (1986)), mutagenesis by total genesynthesis (Nambiar et al., Science 223: 1299-1301 (1984); Sakamar andKhorana, Nucl. Acids Res. 14: 6361-6372 (1988); Wells et al., Gene34:315-323 (1985); and Grundstrom et al., Nucl. Acids Res. 13: 3305-3316(1985). Kits for mutagenesis are commercially available (e.g., Bio-Rad,Amersham International, Anglian Biotechnology).

Other diversity generation procedures are proposed in U.S. Pat. Nos.5,756,316; 5,965,408; Ostermeier et al. (1999) “A combinatorial approachto hybrid enzymes independent of DNA homology” Nature Biotech 17:1205;U.S. Pat. Nos. 5,783,431; 5,824,485; 5,958,672; Jirholt et al. (1998)“Exploiting sequence space: shuffling in vivo formed complementaritydetermining regions into a master framework” Gene 215: 471; U.S. Pat.No. 5,939,250; WO 99/10539; WO 98/58085; WO 99/10539 and others. Thesediversity generating methods can be combined with each other or withshuffling reactions or oligo shuffling methods, in any combinationselected by the user, to produce nucleic acid diversity, which may bescreened for using any available screening method.

Following recombination or other diversification reactions, any nucleicacids which are produced can be selected for a desired activity. In thecontext of the present invention, this can include testing for andidentifying any detectable or assayable activity, by any relevant assayin the art. A variety of related (or even unrelated) properties can beassayed for, using any available assay.

Dna Shuffling Without the Use of PCR

Although one preferred format for gene reassembly uses PCR, otherformats are also useful. For example, site-directed oroligonucleotide-directed mutagenesis methods can be used to generatechimeras between 2 or more parental genes (whether homologous ornon-homologous). In this regard, one aspect of the present inventionrelates to a new method of performing recombination between nucleicacids by ligation of libraries of oligonucleotides corresponding to thenucleic acids to be recombined.

In this format, a set of a plurality of oligonucleotides which includesa plurality of nucleic acid sequences from a plurality of the parentalnucleic acids are ligated to produce one or more recombinant nucleicacid(s), typically encoding a full length protein (although ligation canalso be used to make libraries of partial nucleic acid sequences whichcan then be recombined, e.g., to produce a partial or full-lengthrecombinant nucleic acid). The oligonucleotide set typically includes atleast a first oligonucleotide which is complementary to at least a firstof the parental nucleic acids at a first region of sequence diversityand at least a second oligonucleotide which is complementary to at leasta second of the parental nucleic acids at a second region of diversity.The parental nucleic acids can be homologous or non-homologous.

Often, nucleic acids such as oligos are ligated with a ligase. In onetypical format, oligonucleotides are hybridized to a first parentalnucleic acid which acts as a template, and ligated with a ligase. Theoligos can also be extended with a polymerase and ligated. Thepolymerase can be, e.g., an ordinary DNA polymerase or a thermostableDNA polymerase. The ligase can also be an ordinary DNA ligase, or athermostable DNA ligase. Many such polymerases and ligases arecommercially available.

In one set of approaches, a common element for non-PCR basedrecombination methods is preparation of a single-stranded template towhich primers are annealed and then elongated by a DNA polymerase in thepresence of dNTP's and appropriate buffer. The gapped duplex can besealed with ligase prior to transformation or electroporation into E.coli. The newly synthesized strand is replicated and generates achimeric gene with contributions from the oligo in the context of thesingle-stranded (ss) parent.

For example, the ss template can be prepared by incorporation of thephage IG region into a plasmid and use of a helper phage such as M13KO7(Pharmacia Biotech) or R408 to package ss plasmids into filamentousphage particles. The ss template can also be generated by denaturationof a double-stranded template and annealing in the presence of theprimers. The methods vary in the enrichment methods for isolation of thenewly synthesized chimeric strand over the parental template strand.Isolation and selection of double stranded templates can be performedusing available methods. See e.g., Ling et al. (1997) “Approaches to DNAmutagenesis: an overview.” Anal Biochem. December 15;254(2):157-78; Daleet al. (1996) “Oligonucleotide-directed random mutagenesis using thephosphorothioate method” Methods Mol Biol. 57:369-74; Smith (1985) “Invitro mutagenesis” Ann. Rev. Genet. 19:423-462; Botstein & Shortle(1985) “Strategies and applications of in vitro mutagenesis” Science229:1193-1201; and Carter (1986) “Site-directed mutagenesis” Biochem J.237:1-7; Kunkel (1987) “The efficiency of oligonucleotide directedmutagenesis” Nucleic Acids & Molecular Biology (1987); Eckstein, F. andLilley, D. M. J. eds Springer Verlag, Berlin.

For example, in one aspect, a “Kunkel style” method uses uracilcontaining templates. Similarly, the “Eckstein” method usesphosphorothioate-modified DNA (Taylor et al. (1985) “The use ofphosphorothioate-modified DNA in restriction enzyme reactions to preparenicked DNA.” Nucleic Acids Res. 13:8749-8764; Taylor et al. (1985) “Therapid generation of oligonucleotide-directed mutations at high frequencyusing phosphorothioate-modified DNA” Nucleic Acids Res. 13:8765-8787;Nakamaye & Eckstein (1986) “Inhibition of restriction endonuclease Nci Icleavage by phosphorothioate groups and its application tooligonucleotide-directed mutagenesis.” Nucleic Acids Res. 14: 9679-9698;Sayers et al. (1988). “Y-T Exonucleases in phosphorothioate-basedoligonucleotide-directed mutagenesis.” Nucleic Acids Res. 16:791-802;Sayers et al. (1988) “5′-3′ Strand specific cleavage ofphosphorothioate-containing DNA by reaction with restrictionendonucleases in the presence of ethidium bromide” Nucleic Acids Res.16:803-814). The use of restriction selection, or e.g., purification canbe used in conjunction with mismatch repair deficient strains (see,e.g., Carter et al. (1985) “Improved oligonucleotide site directedmutagenes-s using M13 vectors” Nucleic Acids Res. 13, 4431-4443 Carter(1987) “Improved oligonucleotide-directed mutagenesis using M13vectors.” Methods in Enzymol. 154:382-403; Wells (1986) “Importance ofhydrogen bond formation in stabilizing the transition state ofsubtilisin.” Trans. R. Soc. Lond. A317, 415-423).

The “mutagenic” primer used in these methods can be a syntheticoligonucleotide encoding any type of randomization, insertion, deletion,family gene shuffling oligonucleotide based on sequence diversity ofhomologous genes, etc. The primer(s) could also be fragments ofhomologous genes that are annealed to the ss parent template. In thisway chimeras between 2 or more parental genes can be generated.

Multiple primers can anneal to a given template and be extended tocreate multiply chimeric genes. The use of a DNA polymerase such asthose from phages T4 or T7 are suitable for this purpose as they do notdegrade or displace a downstream primer from the template.

For example, in one aspect, DNA shuffling is performed using uracilcontaining templates. In this embodiment, the gene of interest is clonedinto an E. coli plasmid containing the filamentous phage intergenic (IG,ori) region. Single stranded (ss) plasmid DNA is packaged into phageparticles upon infection with a helper phage such as M13KO7 (Pharmacia)or R408 and can be easily purified by methods such as phenol/chloroformextraction and ethanol precipitation. If this DNA is prepared in adutung strain of E. coli, a small number of uracil residues areincorporated into it in place of the normal thymine residues. One ormore primers or other oligos as described above are annealed to the ssuracil-containing template by heating to 90° C. and slowly cooling toroom temperature. An appropriate buffer containing all 4deoxyribonucleotides, T7 DNA polymerase and T4 DNA ligase is added tothe annealed template/primer mix and incubated between room temperatureand e.g., about 37° C. for ≧1 hour. The T7 DNA polymerase extends fromthe 3′ end of the primer and synthesizes a complementary strand to thetemplate incorporating the primer. DNA ligase seals the gap between the3′ end of the newly synthesized strand and the 5′ end of the primer.

If multiple primers are used, then the polymerase will extend to thenext primer, stop and ligase will seal the gap. This reaction is thentransformed into an ung+ strain of E. coli and antibiotic selection forthe plasmid is applied. The uraci 7-glycosylase (ung gene product)enzyme in the host cell recognizes the uracil in the template strand andremoves it, creating apyrimidinic sites that are either not replicatedor the host repair systems will correct it by using the newlysynthesized strand as a template. The resulting plasmids predominantlycontain the desired change in the gene if interest. If multiple primersare used then it is possible to simultaneously introduce numerouschanges in a single reaction. If the primers are derived from orcorrespond to fragments of homologous genes, then multiply chimericgenes can be generated.

Codon Modification

In one aspect, the oligonucleotides utilized in the methods herein havealtered codon use as compared to the parental sequences from which theoligonucleotides are derived. In particular, it is useful, e.g., tomodify codon preference to optimize expression in a cell in which arecombinant product of an oligonucleotide shuffling procedure is to beassessed or otherwise selected. Conforming a recombinant nucleic acid tothe codon bias of a particular cell in which selection is to take placetypically results in maximization of expression of the recombinantnucleic acid. Because the oligonucleotides used in the variousstrategies herein typically are made synthetically, selecting optimalcodon preference is done simply by reference to well-known codon-biastables. Codon-based synthetic methods, as described supra, areoptionally used to modify codons in synthetic protocols.

In addition to the selection of oligonucleotide sequences to optimizeexpression, codon preference can also be used to increase sequencesimilarity between distantly related nucleic acids which are to berecombined. By selecting which codons are used in particular positions,it is possible to increase the similarity between the nucleic acids,which, in turn, increases the frequency of recombination between thenucleic acids. Additional details on codon modification procedures andtheir application to DNA shuffling are found in Paten and Stemmer, U.S.Ser. No. 60/102,362 “SHUFFLING OF CODON ALTERED NUCLEIC ACIDS,” filedSep. 29, 1998 and related application of Paten and Stemmer, Attorneydocket number 018097-028510, entitled “SHUFFLING OF CODON ALTEREDNUCLEIC ACIDS,” filed Jan. 29, 1999.

Length Variation by Modular Shuffling

Many functional sequence domains for genes and gene elements arecomposed of functional subsequence domains. For example, promotersequences are made up of a number of functional sequence elements whichbind transcription factors, which, in turn, regulate gene expression.Enhancer elements can be combined with promoter elements to enhanceexpression of a given gene. Similarly, at least some exons representmodular domains of an encoded protein, and exons can be multimerized ordeleted relative to a wild-type gene and the resulting nucleic acidsrecombined to provide libraries of altered gene (or encoded protein)modules (i.e., libraries of module inserted or deleted nucleic acids).The number and arrangement of modular sequences, as well as theirsequence composition, can affect the overall activity of the promoter,exon, or other genetic module.

The concept of exons as modules of genes and encoded proteins isestablished, particularly for proteins which have developed ineukaryotes. See, e.g., Gilbert and Glynias (1993) Gene 137-144; Dolittleand Bork (October 1993) Scientific American 50-56; and Patthy (1991)Current Opinions in Structural Biology 1:351-361. Shuffling of exonmodules is optimized by an understanding of exon shuffling rules.Introns (and consequently exons) occur in three different phases,depending on the splice junction of a codon at the exon-intron boundary.See, Stemmer (1995) Biotechnology 13:549-553; Patthy (1994) CurrentOpinions in Structural Biology 4:383-392 and Patthy (1991) CurrentOpinions in Structural Biology 1:351-361.

In nature, splice junctions of shuffled exons have to be “phasecompatible” with those of neighboring exons—if not, then a shift inreading frame occurs, eliminating the information of the exon module.The three possible phases of an intron are phases 1, 2, or 0, for thebase position within the codon at the intron-exon boundary in which theintron occurs. Classification of introns according to their locationrelative to the reading frame is as follows: a phase 0 intron is locatedbetween two codons of flanking exons; a phase 1 intron is locatedbetween the first and second nucleotide of a codon and a phase 2 intronis located between the second and third nucleotide of a codon. Phase 1introns are the most common in nature.

One aspect of the present invention is the shuffling of modularsequences (including, e.g., promoter elements and exons) to vary thesequence of such modules, the number of repeats of modules (from 0(i.e., a deletion of the element) to a desired number of copies) and thelength of the modules. In particular, standard shuffling methods, and/orthe oligonucleotide-mediated methods herein, can be combined withelement duplication and length variation approaches simply by spikingappropriately designed fragments or oligonucleotides into arecombination mixture.

For example, a PCR-generated fragment containing the element to berepeated is spiked into a recombination reaction, with ends designed tobe complementary, causing the creation of multimers in a subsequentrecombination reaction. These multimers can be incorporated into finalshuffled products by homologous recombination at the ends of themultimers, with the overall lengths of such multimers being dependent onthe molar ratios of the modules to be multimerized. The multimers can bemade separately, or can be oligos in a gene reassembly/recombinationreaction as discussed supra.

In a preferred aspect, oligos are selected to generate multimers and/orto delete selected modules such as exons, promoter elements, enhancers,or the like during oligonucleotide recombination and gene assembly,thereby avoiding the need to make multimers or nucleic acids comprisingmodule deletions separately. Thus, in one aspect, a set of overlappingfamily gene shuffling oligonucleotides is constructed to comprise oligoswhich provide for deletion or multimerization of sequence moduleelements. These “module shuffling” oligonucleotides can be used inconjunction with any of the other approaches herein to recombinehomologous nucleic acids. Thus, sequence module elements are thosesubsequences of a given nucleic acid which provide an activity ordistinct component of an activity of a selected nucleic acid, whilemodule shuffling oligonucleotides are oligonucleotides which provide forinsertion, deletion or multimerization of sequence modules. Examples ofsuch oligonucleotides include those having subsequences corresponding tomore than one sequence module (providing for deletion of interveningsequences and/or insertion of a module in a selected position), one ormore oligonucleotides with ends that have regions of identity permittingmultimerization of the one or more oligonucleotides (and, optionally, ofassociated sequences) during hybridization and elongation of a mixtureof oligonucleotides, and the like.

Libraries resulting from module deletion/insertion strategies notedabove vary in the number of copies and arrangement of a given modulerelative to a corresponding or homologous parental nucleic acid. Whenthe modules are exons, the oligonucleotides used in the recombinationmethods are typically selected to result in exons being joined in thesame phase (i.e., having the same reading frame) to increase thelikelihood that any given library member will be functionally active.This is illustrated schematically in FIG. 3. The differently shadedmodules represent separate exons, with the phase of the exon beingindicated as 1, 2, or 0.

Shuffling of Cladistic Intermediates

The present invention provides for the shuffling of“evolutionaryintermediates.” In the context of the present invention, evolutionaryintermediates are artificial constructs which are intermediate incharacter between two or more homologous sequences, e.g., when thesequences are grouped in an evolutionary dendogram.

Nucleic acids are often classified into evolutionary dendograms (or“trees”) showing evolutionary branch points and, optionally,relatedness. For example, cladistic analysis is a classification methodin which organisms or traits (including nucleic acid or polypeptidesequences) are ordered and ranked on a basis that reflects origin from apostulated common ancestor (an intermediate form of the divergent traitsor organisms). Cladistic analysis is primarily concerned with thebranching of relatedness trees (or “dendograms”) which showsrelatedness, although the degree of difference can also be assessed (adistinction is sometimes made between evolutionary taxomomists whoconsider degrees of difference and those who simply determine branchpoints in an evolutionary dendogram (classical cladistic analysis); forpurposes of the present invention, however, relatedness trees producedby either method can produce evolutionary intermediates).

Cladistic or other evolutionary intermediates can be determined byselecting nucleic acids which are intermediate in sequence between twoor more extant nucleic acids. Although the sequence may not exist innature, it still represents a sequence which is similar to a sequence innature which had been selected for, i.e., an intermediate of two or moresequences represents a sequence similar to the common ancestor of thetwo or more extant nucleic acids. Thus, evolutionary intermediates areone preferred shuffling substrate, as they represent “pseudo selected”sequences, which are more likely than randomly selected sequences tohave activity.

One benefit of using evolutionary intermediates as substrates forshuffling (or of using oligonucleotides which correspond to suchsequences) is that considerable sequence diversity can be represented infewer starting substrates (i.e., if starting with parents A and B, asingle intermediate “C” has at least a partial representation of both Aand B). This simplifies the oligonucleotide synthesis for genereconstruction/recombination methods, improving the efficiency of theprocedure. Further, searching sequence databases with evolutionaryintermediates increases the chances of identifying related nucleic acidsusing standard search programs such as BLAST.

Intermediate sequences can also be selected between two or moresynthetic sequences which are not represented in nature, simply bystarting from two synthetic sequences. Such synthetic sequences caninclude evolutionary intermediates, proposed gene sequences, or othersequences of interest that are related by sequence. These “artificialintermediates” are also useful in reducing the complexity of genereconstruction methods and for improving the ability to searchevolutionary databases.

Accordingly, in one significant embodiment of the invention, characterstrings representing evolutionary or artificial intermediates are firstdetermined using alignment and sequence relationship software (BLAST,PILEUP, etc.) and then synthesized using oligonucleotide reconstructionmethods. Alternately, the intermediates can form the basis for selectionof oligonucleotides used in the gene reconstruction methods herein.

Further details regarding advanced procedures for generating cladisticintermediates, including in silico shuffling using hidden Markov modelthreading are set forth in co-filed application “METHODS FOR MAKINGCHARACTER STRINGS, POLYNUCLEOTIDES AND POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov et al., Attorney Docket Number 02-289-30USand in co-filed PCT application (designating the United States) “METHODSFOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES AND POLYPEPTIDES HAVINGDESIRED CHARACTERISTICS” by Selifonov et al., Attorney Docket Number02-289-30PC.

Protein Domain Shuffling

Family shuffling of genes is a good way to access functional diversityof encoded proteins. It can be advantageous, however, to shuffle only aportion of an encoded protein which provides an activity of interest,particularly where the protein is multifunctional and one or moreactivity can be mapped to a subsequence (a domain) of the overallprotein.

For example, enzymes such as glycosyl transferases have two substrates:the acceptor and the activated sugar donor. To change the sugar to betransferred without altering the acceptor, it can be preferable tofamily shuffle only the sugar binding domain, since family shuffling thesugar acceptor domain can result in lowered numbers of the desiredacceptor.

In one example, there are 5 enzymes, eA-eE (each of 500 amino acids)that transfers sugars a-e to acceptors A-E. To generate a library ofenzymes that transfer sugars a-e to acceptor A it can be preferable toshuffling the sugar binding domains of eA-eE, combining them withacceptor binding domains of eA.

One technical challenge in practicing this strategy is that there can beinsufficient data to identify such functional domains in a protein ofinterest. When this is the case, a set of libraries can be generated byfamily shuffling random portions of the enzyme. For example, as appliedto the family shuffling of enzymes eA-eE, above, a first library can bemade encoding the first 100 amino acids of eA-eE, in combination withthe last 400 amino acids of any one of eA-eE by appropriately selectingoligonucleotide sets for recombination and elongation. A second librarycan be made which family shuffles the second 100 amino acids of eA-eE,in combination with encoding the first 100 amino acids of any, one ofeA-Ae and the last 300 amino acids of any one of eA-Ae, and so on. Smallsubsetsof these libraries are screened for a first desired function.Libraries that have retained the first desired function (e.g., acceptoractivity in the example above) have a relatively higher proportion ofvariants in additional selectable functions (e.g., sugar transfer in theexample above).

This approach can be used for diversification of any multi-functionalprotein in which one property is desirably conserved. This strategy isparticularly advantageous when the property to be conserved is complex(e.g., substrate specificity for, e.g., polypeptides, non-ribosomalpeptides or other natural products).

In general, selection of oligonucleotides to provide shuffling ofindividual domains (whether corresponding to known functionalsubsequences or to subsequences of unknown function as noted above) isperformed by providing two general types of sequence-relatedoligonucleotides. The first type is provided by selectingsequence-related overlapping oligonucleotide sets corresponding toregions where recombination is desired ( according to the strategiesnoted herein), while the second type provides recombination between thedomains to be shuffled and non-shuffled domains, i.e., similar to; acrossover oligonucleotide as described herein. The non-shuffled domainscan be produced by simple oligonucleotide gene reconstruction methods(e.g., using ligation or polymerase-mediated extension reactions toconcatenate oligonucleotides), or the non-shuffled domains can beproduced by enzymatic cleavage of larger nucleic acids.

Expanded Family Shuffling Incorporating Molecular Modeling and AlanineScanning

Family based oligo shuffling involves the recombination of homologousnucleic acids by making sets of family shuffling oligonucleotides whichare recombined in gene synthesis and recombination protocols asdiscussed supra. As noted, the homologous nucleic acids can be naturalor non-natural (i.e., artificial) homologues.

One advantage of recombining non-natural homologues is that sequencespace other than naturally occurring sequence space is accessed by theresulting recombinant nucleic acids. This additional diversity allowsfor the development or acquisition of functional properties that are notprovided by recombination of nucleic acids representing naturaldiversity.

The main disadvantage of creating random homologues for recombination isthat many of the resulting homologues are not functional with respect toa relevant characteristic. For these homologues, much of the resultingincrease in selectable sequence space is undesirable “noise” which hasto be selected out of the population. In contrast, natural diversityrepresents evolutionary tested molecules, representing a more targetedoverall potential sequence space in which recombination occurs.

One way of capturing non-natural diversity without significantlyincreasing undesirable sequence space is to define those positions whichcan be modified in a given gene without significantly degrading thedesired functional property of an encoded molecule (protein, RNA, etc.).At least two basic approaches to can be used.

First, point mutagenesis (e.g., alanine scanning) can be performed todefine positions that can be mutated without a significant loss offunction. In principle, all 20 amino acids could be tested at eachposition to define a large spectrum of point mutations that areessentially neutral with respect to function. Sets of shuffling oligosare then made which capture these non-natural (but still active)homologues. For many commercially important proteins, alanine scanninginformation is already available. For example, Young et al. (1997)Protein Science 6:1228-1236 describe alanine scanning of granulocytecolony stimulating factor (G-CSF).

Second, where structural information is available for a protein (and,e.g., how the protein interacts with a ligand), regions can be definedwhich are predicted to be mutable with little or no change in function.Sets of family shuffling oligos are then made which capture thesenon-natural (but still predicted to be active) homologues. A variety ofprotein crystal structures are available (including, e.g., the crystalstructure of G-CSF: Hill et al. (1993) PNAS 90:5167).

Similarly, even where structural information is not available, molecularmodeling can be preformed to provide a predicted structure, which canalso be used to predict which residues can be changed without alteringfunction. A variety of protein structure modeling programs arecommercially available for predicting protein structure. Further, therelative tendencies of amino acids to form regions of superstructure(helixes, β-sheets, etc.) are well established. For example, O'Neil andDeGrado Science v.250 provide a discussion of the helix formingtendencies of the commonly occurring amino acids. Tables of relativestructure forming activity for amino acids can be used as substitutiontables to predict which residues can be functionally substituted in agiven portion. Sets of family shuffling oligos are then made whichcapture these non-natural (but still predicted to be active) homologues.

For example, Protein Design Automation (PDA) is one computationallydriven system for the design and optimization of proteins and peptides,as well as for the design of proteins and peptides. Typically, PDAstarts with a protein backbone structure and designs the amino acidsequence to modify the protein's properties, while maintaining it'sthree dimensional folding properties. Large numbers of sequences can bemanipulated using PDA, allowing for the design of protein structures(sequences, subsequences, etc.). PDA is described in a number ofpublications, including, e.g., Malakauskas and Mayo (1998) “Design,Structure and Stability of a Hyperthermophilic Protein Variant” NatureStruc. Biol. 5:470; Dahiyat and Mayo (1997) “De Novo Protein Design:Fully Automated Sequence Selection” Science, 278, 82-87. DeGrado, (1997)“Proteins from Scratch” Science, 278:80-81; Dahiyat, Sarisky and Mayo(1997) “De Novo Protein Design: Towards Fully Automated SequenceSelection” J. Mol. Biol. 273:789-796; Dahiyat and Mayo (1997) “Probingthe Role of Packing Specificity in Protein Design” Proc. Natl. Acad.Sci. USA, 94:10172-10177; Hellinga (1997) “Rational ProteinDesign—Combining Theory and Experiment” Proc. Natl. Acad. Sci. USA,94:10015-10017; Su and Mayo (1997)“Coupling Backbone Flexibility andAmino Acid Sequence Selection in Protein Design” Prot. Sci. 6:1701-1707;Dahiyat, Gordon and Mayo (1997) “Automated Design of the SurfacePositions of Protein Helices” Prot. Sci., 6:1333-1337; Dahiyat and Mayo(1996) “Protein Design Automation” Prot. Sci., 5:895-903. Additionaldetails regarding PDA are available, e.g., at http://www.xencor.com/.PDA can be used to identify variants of a sequence that are likely toretain activity, providing a set of homologous nucleic acids that can beused as a basis for oligonucleotide mediated recombination.

Post-Recombination Screening Techniques

The precise screening method that is used in the various shufflingprocedures herein is not a critical aspect of the invention. In general,one of skill can practice appropriate screening (i.e., selection)methods, by reference to the activity to be selected for.

In any case, one or more recombination cycle(s) is/are usually followedby one or more cycle of screening or selection for molecules ortransformed cells or organisms having a desired property, trait orcharacteristic. If a recombination cycle is performed in vitro, theproducts of recombination, i.e., recombinant segments, are sometimesintroduced into cells before the screening step. Recombinant segmentscan also be linked to an appropriate vector or other regulatorysequences before screening. Alternatively, products of recombinationgenerated in vitro are sometimes packaged in viruses (e.g.,bacteriophage) before screening. If recombination is performed in vivo,recombination products can sometimes be screened in the cells in whichrecombination occurred. In other applications, recombinant segments areextracted from the cells, and optionally packaged as viruses, beforescreening.

The nature of screening or selection depends on what property orcharacteristic is to be acquired or the property or characteristic forwhich improvement is sought, and many examples are discussed below. Itis not usually necessary to understand the molecular basis by whichparticular products of recombination (recombinant segments) haveacquired new or improved properties or characteristics relative to thestarting substrates. For example, a gene can have many componentsequences, each having a different intended role (e.g., coding sequence,regulatory sequences, targeting sequences, stability-conferringsequences, subunit sequences and sequences affecting integration). Eachof these component sequences can be varied and recombinedsimultaneously. Screening/selection can then be performed, for example,for recombinant segments that have increased ability to confer activityupon a cell without the need to attribute such improvement to any of theindividual component sequences of the vector.

Depending on the particular screening protocol used for a desiredproperty, initial round(s) of screening can sometimes be performed usingbacterial cells due to high transfection efficiencies and ease ofculture. However, bacterial expression is often not practical ordesired, and yeast, fungal or other eukaryotic systems are also used forlibrary expression and screening. Similarly, other types of screeningwhich are not amenable to screening in bacterial or simple eukaryoticlibrary cells, are performed in cells selected for use in an environmentclose to that of their intended use. Final rounds of screening can beperformed in the precise cell type of intended use.

One approach to screening diverse libraries is to use a massivelyparallel solid-phase procedure to screen shuffled nucleic acid products,e.g., encoded enzymes, for enhanced activity. Massively parallelsolid-phase screening apparatus using absorption, fluorescence, or FRETare available. See, e.g., U.S. Pat. No. 5,914,245 to Bylina, et al.(1999); see also, http://www.kairos-scientific.com/; Youvan et al.(1999) Fluorescence Imaging Micro-Spectrophotometer (FIMS)“Biotechnologyet alia<www.et-al.com> 1:1-16; Yang et al. (1998) “High ResolutionImaging Microscope (HIRIM)” Biotechnology et alia, <www.et-al.com>4:1-20; and Youvan et al. (1999) “Calibration of Fluorescence ResonanceEnergy Transfer in Microscopy Using Genetically Engineered GFPDerivatives on Nickel Chelating Beads” posted atwww.kairos-scientific.com. Following screening by these techniques,sequences of interest are typically isolated, optionally sequenced andthe sequences used as set forth herein to design new oligonucleotideshuffling methods.

If further improvement in a property is desired, at least one andusually a collection of recombinant segments surviving a first round ofscreening/selection are subject to a further round of recombination.These recombinant segments can be recombined with each other or withexogenous segments representing the original substrates or furthervariants thereof. Again, recombination can proceed in vitro or it vivo.If the previous screening step identifies desired recombinant segmentsas components of cells, the components can be subjected to furtherrecombination iii vivo, or can be subjected to further recombination illvitro, or can be isolated before performing a round of iii vitrorecombination. Conversely, if the previous screening step identifiesdesired recombinant segments in naked form or as components of viruses,these segments can be introduced into cells to perform a round of invivo recombination. The second round of recombination, irrespective howperformed. generates further recombinant segments which encompassadditional diversity than is present in recombinant segments resultingfrom previous rounds.

The second round of recombination can be followed by a further round ofscreening/selection according to the principles discussed above for thefirst round. The stringency of screening/selection can be increasedbetween rounds. Also, the nature of the screen and the property beingscreened for can vary between rounds if improvement in more than oneproperty is desired or if acquiring more than one new property isdesired. Additional rounds of recombination and screening can then beperformed until the recombinant segments have sufficiently evolved toacquire the desired new or improved property or function.

Post-Shuffling Procedures

The nucleic acids produced by the methods of the invention areoptionally cloned into cells for activity screening (or used in in vitrotranscription reactions to make products which are screened).Furthermore, the nucleic acids can be sequenced, expressed, amplified invitro or treated in any other common recombinant method.

General texts which describe molecular biological techniques usefulherein, including cloning, mutagenesis, library construction, screeningassays, cell culture and the like include Berger and Kimmel, Guide toMolecular Cloning Techniques, Methods in Enzymology volume 152 AcademicPress, Inc., San Diego, Calif. (Berger); Sambrook et al., MolecularCloning—A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring HarborLaboratory, Cold Spring Harbor, N.Y., 1989 (“Sambrook”) and CurrentProtocols in Molecular Biology, F. M. Ausubel et al., eds., CurrentProtocols, a joint venture between Greene Publishing Associates, Inc.and John Wiley & Sons, Inc., (supplemented through 1998) (“Ausubel”)).Methods of transducing cells, including plant and animal cells, withnucleic acids are generally available, as are methods of expressingproteins encoded by such nucleic acids. In addition to Berger, Ausubeland Sambrook, useful general references for culture of animal cellsinclude Freshney (Culture of Animal Cells, a Manual of Basic Technique,third edition Wiley-Liss, New York (1994)) and the references citedtherein, Humason (Animal Tissue Techniques, fourth edition W.H. Freemanand Company (1979)) and Ricciardelli, et al., In Vitro Cell Dev. Biol.25:1016-1024 (1989). References for plant cell cloning, culture andregeneration include Payne et al. (1992) Plant Cell and Tissue Culturein Liquid Systems John Wiley & Sons, Inc. New York, N.Y. (Payne); andGamborg and Phillips (eds) (1995) Plant Cell, Tissue and Organ Culture;Fundamental Methods Springer Lab Manual, Springer-Verlag (BerlinHeidelberg New York) (Gamborg). A variety of Cell culture media aredescribed in Atlas and Parks (eds) The Handbook of Microbiological Media(1993) CRC Press, Boca Raton, Fla. (Atlas). Additional information forplant cell culture is found in available commercial literature such asthe Life Science Research Cell Culture Catalogue (1998) fromSigma-Aldrich, Inc (St Louis, Mo.) (Sigma-LSRCCC) and, e.g., the PlantCulture Catalogue and supplement (1997) also from Sigma-Aldrich, Inc (StLouis, Mo.) (Sigma-PCCS).

Examples of techniques sufficient to direct persons of skill through invitro amplification methods, useful e.g., for amplifying oligonucleotideshuffled nucleic acids including the polymerase chain reaction (PCR) theligase chain reaction (LCR), Qβ-replicase amplification and other RNApolymerase mediated techniques (e.g., NASBA). These techniques are foundin Berger, Sambrook, and Ausubel, id., as well as in Mullis et al.,(1987) U.S. Pat. No. 4,683,202; PCR Protocols A Guide to Methods andApplications (Innis et al. eds) Academic Press Inc. San Diego, Calif.(1990) (Innis); Arnheim & Levinson (Oct. 1, 1990) C&EN 36-47; TheJournal Of NIH Research (1991) 3, 81-94; Kwoh et al. (1989) Proc. Natl.Acad. Sci. USA 86, 1173; Guatelli et al. (1990) Proc. Natl. Acad. Sci.USA 87, 1874; Lomell et al. (1989) J. Clin. Chem 35, 1826; Landegren etal., (1988) Science 241, 1077-1080; Van Brunt (1990) Biotechnology 8,291-294; Wu and Wallace, (1989) Gene 4, 560; Barringer et al. (1990)Gene 89, 117, and Sooknanan and Malek (1995) Biotechnology 13: 563-564.Improved methods of cloning in vitro amplified nucleic acids aredescribed in Wallace et al., U.S. Pat. No. 5,426,039. Improved methodsof amplifying large nucleic acids by PCR are summarized in Cheng et al.(1994) Nature 369: 684-685 and the references therein, in which PCRamplicons of up to 40 kb are generated. One of skill will appreciatethat essentially any RNA can be converted into a double stranded DNAsuitable for restriction digestion, PCR expansion and sequencing usingreverse transcriptase and a polymerase. See, Ausubel, Sambrook andBerger, all supra. In one preferred method, reassembled sequences arechecked for incorporation of family gene shuffling oligonucleotides.This can be done by cloning and sequencing the nucleic acids, and/or byrestriction digestion, e.g., as essentially taught in Sambrook, Bergerand Ausubel, above. In addition, sequences can be PCR amplified andsequenced directly. Thus, in addition to, e.g., Sambrook, Berger,Ausubel and Innis (id. and above), additional PCR sequencing PCRsequencing methodologies are also particularly useful. For example,direct sequencing of PCR generated amplicons by selectivelyincorporating boronated nuclease resistant nucleotides into theamplicons during PCR and digestion of the amplicons with a nuclease toproduce sized template fragments has been performed (Porter et al.(1997) Nucleic Acids Research 25(8):1611-1617). In the methods, 4 PCRreactions on a template are performed, in each of which one of thenucleotide triphosphates in the PCR reaction mixture is partiallysubstituted with a 2′deoxynucleoside 5′-[P-borano]-triphosphate. Theboronated nucleotide is stochastically incorporated into PCR products atvarying positions along the PCR amplicon in a nested set of PCRfragments of the template. An exonuclease which is blocked byincorporated boronated nucleotides is used to cleave the PCR amplicons.The cleaved amplicons are then separated by size using polyacrylamidegel electrophoresis, providing the sequence of the amplicon. Anadvantage of this method is that it uses fewer biochemical manipulationsthan performing standard Sanger-style sequencing of PCR amplicons.

In Silico Shuffling

“In silico” shuffling utilizes computer algorithms to perform “virtual”shuffling using genetic operators in a computer. As applied to thepresent invention, gene sequence strings are recombined in a computersystem and desirable products are made, e.g., by reassembly PCR ofsynthetic oligonucleotides as described herein. In silico shuffling isdescribed in detail in “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES AND POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., attorney docket number 0-289-3US, filed herewith.

In brief, genetic operators (algorithms which represent given geneticevents such as point mutations, recombination of to strands ofhomologous riicleic acids, etc.) are used to model recombinational ormutational erents which can occur in one or more nucleic acid, e.g., byaligning nucleic acid sequence strings (using standard alignmentsoftware, or by manual inspection and alignment) such as thoserepresenting homologous nucleic acids and predicting recombinationaloutcomes. The predicted recombinational outcomes are used to producecorresponding molecules, e.g., by oligonucleotide synthesis andreassembly PCR.

Integrated Assays and Integrated System Elements

As noted throughout, one preferred aspect of the present invention isthe alignment of nucleic acids using a computer and sequence alignmentsoftware. Similarly, computers having appropriate software can be usedto perform “in silico” shuffling prior to physical oligonucleotidesynthesis. In addition, other important integrated system components canprovide for high-throughput screening assays, as well as the coupling ofsuch assays to oligonucleotide selection, synthesis and recombination.

Of course, the relevant assay will depend on the application. Manyassays for proteins, receptors, ligands and the like are known. Formatsinclude binding to immobilized components, cell or organismal viability,production of reporter compositions, and the like.

In the high throughput assays of the invention, it is possible to screenup to several thousand different shuffled variants in a single day. Inparticular, each well of a microtiter plate can be used to run aseparate assay, or, if concentration or incubation time effects are tobe observed, every 5-10 wells can test a single variant. Thus, a singlestandard microtiter plate can assay about 100 (e.g., 96) reactions. If1536 well plates are used, then a single plate can easily assay fromabout 100- about 1500 different reactions. It is possible to assayseveral different plates per day; assay screens for up to about6,000-20,000 different assays (i.e., involving different nucleic acids,encoded proteins, concentrations, etc.) is possible using the integratedsystems of the invention. More recently, microfluidic approaches toreagent manipulation have been developed, e.g., by Caliper Technologies(Mountain View, Calif.).

In one aspect, library members, e.g., cells, viral plaques, spores orthe like, are separated on solid media to produce individual colonies(or plaques). Using an automated colony picker (e.g., the Q-bot,Genetix, U.K.), colonies or plaques are identified, picked, and up to10,000 different mutants inoculated into 96 well microtiter dishescontaining two 3 mm glass balls/well. The Q-bot does not pick an entirecolony but rather inserts a pin through the center of the colony andexits with a small sampling of cells, (or mycelia) and spores (orviruses in plaque applications). The time the pin is in the colony, thenumber of dips to inoculate the culture medium, and the time the pin isin that medium each effect inoculum size, and each can be controlled andoptimized. The uniform process of the Q-bot decreases human handlingerror and increases the rate of establishing cultures (roughly 10,000/4hours). These cultures are then shaken in a temperature and humiditycontrolled incubator. The glass balls in the microtiter plates act topromote uniform aeration of cells and the dispersal of mycelialfragments similar to the blades of a fermenter. Clones from cultures ofinterest can be cloned by limiting dilution. As also described supra,plaques or cells constituting libraries can also be screened directlyfor production of proteins, either by detecting hybridization, proteinactivity, protein binding to antibodies, or the like.

A number of well known robotic systems have also been developed forsolution phase chemistries useful in assay systems. These systemsinclude automated workstations like the automated synthesis apparatusdeveloped by Takeda Chemical Industries, LTD. (Osaka, Japan) and manyrobotic systems utilizing robotic arms (Zymate II, Zymark Corporation,Hopkinton, Mass.; Orca, Beckman Coulter, Inc. (Fullerton, Calif.)) whichmimic the manual synthetic operations performed by a scientist. Any ofthe above devices are suitable for use with the present invention, e.g.,for high-throughput screening of molecules assembled from the variousoligonucleotide sets described herein. The nature and implementation ofmodifications to these devices (if any) so that they can operate asdiscussed herein with reference to the integrated system will beapparent to persons skilled in the relevant art.

High throughput screening systems are commercially available (see, e.g.,Zymark Corp., Hopkinton, Mass.; Air Technical Industries, Mentor, Ohio;Beckman Instruments, Inc. Fullerton, Calif.; Precision Systems, Inc.,Natick, Mass., etc.). These systems typically automate entire proceduresincluding all sample and reagent pipetting, liquid dispensing, timedincubations, and final readings of the microplate in detector(s)appropriate for the assay. These configurable systems provide highthroughput and rapid start up as well as a high degree of flexibilityand customization. The manufacturers of such systems provide detailedprotocols the various high throughput. Thus, for example, Zymark Corp.provides technical bulletins describing screening systems for detectingthe modulation of gene transcription, ligand binding, and the like.

Optical images viewed (and, optionally, recorded) by a camera or otherrecording device (e.g., a photodiode and data storage device) areoptionally further processed in any of the embodiments herein, e.g., bydigitizing the image and/or storing and analyzing the image on acomputer. A variety of commercially available peripheral equipment andsoftware is available for digitizing, storing and analyzing a digitizedvideo or digitized optical image, e.g., using PC (Intel x86 or Pentiumchip-compatible DOS™, OS2™ WINDOWS™, WINDOWS NT™ or WINDOWS95™ basedmachines), MACINTOSH™, or UNIX based (e.g., SUN™ work station)computers. One conventional system carries light from the assay deviceto a cooled charge-coupled device (CCD) camera, in common use in theart. A CCD camera includes an array of picture elements (pixels). Thelight from the specimen is imaged on the CCD. Particular pixelscorresponding to regions of the specimen (e.g., individual hybridizationsites on an array of biological polymers) are sampled to obtain rightintensity readings for each position. Multiple pixels are processed inparallel to increase speed. The apparatus and methods of the inventionare easily used for viewing any sample, e.g., by fluorescent or darkfield microscopic techniques.

Integrated systems for assay analysis in the present invention typicallyinclude a digital computer with sequence alignment software and one ormore of: high-throughput liquid control software, image analysissoftware, data interpretation software, and the like.

A robotic liquid control armature for transferring solutions from asource to a destination can be operably linked to the digital computerand an input device (e.g., a computer keyboard) can be used for enteringdata to the digital computer to control high throughput liquid transfer,oligonucleotide synthesis and the like, e.g., by the robotic liquidcontrol armature. An image scanner can be used for digitizing labelsignals from labeled assay component. The image scanner interfaces withthe image analysis software to provide a measurement of probe labelintensity.

Of course, these assay systems can also include integrated systemsincorporating oligonucleotide selection elements, such as a computer,database with nucleic acid sequences of interest, sequence alignmentsoftware, and oligonucleotide selection software. In addition, thissoftware can include components for ordering the selectedoligonucleotides, and/or directing synthesis of oligonucleotides by anoperably linked oligonucleotide synthesis machine. Thus, the integratedsystem elements of the invention optionally include any of the abovecomponents to facilitate high throughput recombination and selection. Itwill be appreciated that these high-throughput recombination elementscan be in systems separate from those for performing selection assays,or the two can be integrated.

In one aspect, the present invention comprises a computer or computerreadable medium with an instruction set for selecting an oligonucleotideset such as a set of family shuffling oligonucleotides using the methodsdescribed herein. The instruction set aligns homologous nucleic acids toidentify regions of similarity and regions of diversity (e.g., as intypical alignment software such as BLAST) and then selects a set ofoverlapping oligonucleotides that encompass the regions of similarityand diversity, optionally using any of the weighting factors describedherein (e.g., predominant selection of oligonucleotides corresponding toone or more nucleic acid to be recombined, as in the gene blendingmethods herein). The computer or computer readable medium optionallycomprises features facilitating use by a user, e.g., an input field forinputting oligonucleotide selections by the user, a display outputsystem for controlling a user-viewable output (e.g., a GUI), an outputfile which directs synthesis of the oligonucleotides, e.g., in anautomated synthesizer, and the like.

EXAMPLE Betalactamase Shuffling with Three Bridging Oligos

In this example, two beta lactamase genes (CFAMPC and FOX) were shuffledusing three bridging oligonucleotides. The oligos were as follows: 1)CAAATACTGGCCGGAACTGAAAGGTTCTGCTTTCGACGGT 2)GTCGTGTTCTGCAGCCGCTGGGTCTGCACCACACCTACAT 3)TCGTTACTGGCGTATCGGTGACATGACCCAGGGTCTGGGT

The recombination reaction was performed using 2 micrograms of DNAsedfragments from CFAMPC and FOX. All three oligos were added to thereaction 1:1 in a total of 60 microliters of 1×Taq-mix (7070 microlitersof H₂O, 100 microliters Taq buffer, 600 microliters MgCl₂ (25 mM), 80microliters dNTPs (100 mM)).

Reactions were performed with 150 ng primers (2× molar), 750 ng primers(10× molar), and 1500 ng primers (20× molar). 20 microliters of theassembling mix were added to 60 microliters of the 1×Taq mix and 40thermal cycles were performed at 94 ° C. (30 sec.) 40° C. (30 sec) and72 ° C. (30 sec). 1, 2, 4, and 8 microliters of the resulting productswere then PCR amplified for 40 cycles (same thermal cycling conditionsas before) using primers for the end regions of the betalactamase genes.The resulting material was then digested with Sfi overnight at 50° C.,gel purified and ligated into vector Sfi-BLA-Sfi (MG18), transformedinto TG1 and plated on Tet 20. 50 colonies were selected from the Tet 20plates and amplified by colony PCR. The PCR amplicon was then digestedovernight at 37 ° C. with HinF1. Restriciton analysis revealed that 2 wtsequences for each parental gene, as well as 7 different recombinantproducts (for the 10× molar reaction) or 8 different clones (for the 20×reaction) were produced.

EXAMPLE Creating Semisinthetic Library by Oligo Spiking

Genes to be used are cry 2Aa, cry2Ab, and cry2Ac. DNA alignment was donewith DNA star using editseq. and megalig. Oligos are 50 umol synthesis(BRL, Liftech.) Oligos for the region between Amino acid 260-630 aredesigned for cry2Ac in regard to diversity of this region. Oligos areresuspended in 200 ul H₂O. The oligos are as follows: CRY2-1TGGTCGTTATTTAAATATCAAAGCCTTCTAGTATCTTCCGGCGCTAATTT ATATGC CRY2-2CGGCGCTAATTTATATGCGAGTGGTAGTGGTCCAACACAATCATTTACAG CACA CRY2-3CTAATTATGTATTAAATGGTTTGAGTGGTGCTAGGACCACCATTACTTT C CCTAATATT CRY2-4CTTTCCCTAATATTGGTGGTCTTCCCGTCTACCACAACTCAACATTGCAT TTTG CGAGG CRY2-5AGGATTAATTATAGAGGTGGAGTGTCATCTAGCCGCATAGGTCAAGCTAA TCT CRY2-6CTAATCTTAATCAAAACTTTAACATTTCCACACTTTTCAATCCTTTACAA A CACCGTTT CRY2-7TTTATTAGAAGTTGGCTAGATTCTGGTACAGATCGGGAAGGCGTTGCCAC CTCTAC CRY2-8TGCCACCTCTACAAACTGGCAATCAGGAGCCTTTGAGACAACTTTATTA CRY2-9 0ACAACTTTATTACGATTTAGCATTTTTTCAGCTCGTGGTAATTCGAAC TTTTTCCCA CRY2-10TCCGTAATATTTCTGGTGTTGTTGGGACTATTAGCAACGCAGATTTAGCA AG ACCTCTAC CRY2-11ACTTTAATGAAATAAGAGATATAGGAACGACAGCAGTCGCTAGCCTTGTA ACAGTGCATA CRY2-12TAATATCTATGACACTCATGAAAATGGTACTATGATTCATTTAGCGCCAA A TGACTATAC CRY2-13TATACAGGATTTACCGTATCTCCAATACATGCCACTCAAGTAAATAAT C AAATTCGAAC CRY2-14CAAATTCGAACGTTTATTTCCGAAAAATATGGTAATCAGGGTGATTCCT T GAGATTTGA CRY2-15AGATTTGAGCTAAGCAACCCAACGGCTCGATACACACTTAGAGGGAATGG AAATAGTTAC CRY2-16AGAGTATCTTCAATAGGAAGTTCCACAATTCGAGTTACTA CRY2-17CTGCAAATGTTAATACTACCACAAATAATGATGGAGTACTTGATAATG G AGCTCGTTTTT CRY2-18TATCGGTAATGTAGTGGCAAGTGCTAATACTAATGTACCATTAGATATAC A AGTGACATT CRY2-19ATACAAGTGACATTTAACGGCAATCCACAATTTGAGCTTATGAATATTAT G TTTGTTCCA

Family shuffling is done using the assembly conditions described inCrameri et al. (1995) Nature 391: 288-291, except that oligos are spikedinto the assembling mix as described in Crameri et al. (1998) Biotechniques 18(2): 194-196. The PCR reactions with outside primer 1 forATGAATAATGTATTGAATA and 1 rev TTAATAAAGTGGTGGAAGATT are done withTaq/Pfu (9:1) mix (Taq from Qiagen, Pfu from Stratagene) PCR program 96°C. (30 sec). 50° C. (30sec). 72° C. (1 min) for 25 cycles. The reactionis diluted 10× and an additional cycle is performed. The gene is ligatedinto a vector and transformed into TG1 competent Cells, and plated on LB+Amp100 plates. Single colonies are picked for colony PCR and thenanalyzed by restriction digestion.

EXAMPLE Oligo Shuffling of Libraries

An advantage of oligonucleotide mediated shuffling methods is theability to recombine nucleic acids between libraries of oligos generatedfor a number of different sites in a gene of interest. Generatinglibraries with complex combinations of randomizations in differentregions of a target gene is facilitated by oligonucleotide mediatedshuffling approaches.

For example, the antigen-binding site of an antibody or antibodyfragment such as a single-chain Fv (ScFv), or Fab is mainly comprised of6 complementarity-determining regions (CDR's). These CDR's are presenton one face of the properly folded molecule, but are separated in thelinear gene sequence. Synthetic oligonucleotides or those generated byPCR of one or more antibody genes can be used to generate sequencediversity at individual CDR's. This process can be repeated with asecond CDR, then a third, until a library of diverse antibodies isformed. DNA shuffling formats have a distinct advantage that allow forlibraries of each CDR to be generated simultaneously and inter-CDRrecombination events will frequently occur to potentially generate allpossible combinations of different CDR's. Recursive DNA shuffling andscreening for an improved trait or property can be used to optimizeprotein function.

Similarly, the 3-dimensional structures of many cytokines share a common4-helix bundle structure with long connecting loops. The receptorbinding sites for some of these proteins has been determined and islocalized to 2 or more regions of the protein that are separate in thelinear gene sequence. Modeling of related proteins could be used topredict functional regions of unknown proteins for targeting libraries.Libraries in each of these regions can be generated using syntheticoligos, family-shuffling oligos, fragments of homologous genes, orcombinations thereof as herein. Oligonucleotide mediated shufflingallows one to generate libraries in each of these regions simultaneouslyand to generate recombinants between each library. In this way,combinations between members of each library can be screened forimproved function. Those isolates with improved function can then besubmitted to successive rounds of DNA shuffling. In this way, isolateswith the highest activity in each library and potential synergiesbetween members of different libraries can be selected. Other methodsthat optimize each library independently may fail to isolate suchsynergistic interactions.

Another example is the shuffling of enzymes where the active site andsubstrate binding site(s) is comprised of residues close together in the3-dimensional structure of the folded protein, but separated in thelinear sequence of the gene. DNA shuffling can simultaneously generatelibraries in each region that interact with substrate. DNA shufflingalso allows all possible combinations of changes between each library tobe generated and can be evaluated for an improved trait or property.

Modifications can be made to the method and materials as hereinbeforedescribed without departing from the spirit or scope of the invention asclaimed, and the invention can be put to a number of different uses,including:

The use of an integrated system to select family shufflingoligonucleotides (e.g., by a process which includes sequence alignmentof parental nucleic acids) and to test shuffled nucleic acids foractivity, including in an iterative process.

An assay, kit or system utilizing a use of any one of the selectionstrategies, materials, components, methods or substrates hereinbeforedescribed. Kits will optionally additionally comprise instructions forperforming methods or assays, packaging materials, one or morecontainers which contain assay, device or system components, or thelike.

In an additional aspect, the present invention provides kits embodyingthe methods and apparatus herein. Kits of the invention optionallycomprise one or more of the following: (1) a recombination component asdescribed herein; (2) instructions for practicing the methods describedherein, and/or for operating the oligonucleotide synthesis or assembledgene selection procedures herein; (3) one or more assay component; (4) acontainer for holding nucleic acids or enzymes, other nucleic acids,transgenic plants, animals, cells, or the like (5) packaging materials,and (6) a computer or computer readable medium having instruction setsfor aligning target nucleic acids and for selecting oligonucleotideswhich, upon hybridization and elongation, will result in shuffled formsof the target nucleic acids.

In a further aspect, the present invention provides for the use of anycomponent or kit herein, for the practice of any method or assay herein,and/or for the use of any apparatus or kit to practice any assay ormethod herein.

While the foregoing invention has been described in some detail forpurposes of clarity and understanding, it will be clear to one skilledin the art from a reading of this disclosure that various changes inform and detail can be made without departing from the true scope of theinvention. For example, all the techniques and materials described abovecan be used in various combinations. All publications and patentdocuments cited in this application are incorporated by reference intheir entirety for all purposes to the same extent as if each individualpublication or patent document were so individually denoted.

1. A method of recombining homologous nucleic acids, the methodcomprising: (i) hybridizing a set of family gene shufflingoligonucleotides; and, (ii) elongating the set of family gene shufflingoligonucleotides, thereby providing a population of recombined nucleicacids. 2-104. (canceled)